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PREFACE 

Thais  Memorandum  was  written  for  the  Office  of  the  Assistant  Sec¬ 
retary  of  Defense  (OASD) ,  Systems  Analysis.  It  supplements  material 
presented  in  the  proposed  OASD  handbook.  An  Introduction  to  Military 
Hardware  Coat  Analysis. 

The  cost  analyst  drav's  heavily  on  formal  statistics  in  general  and 
regression  anaLysis  in  particular  when  developing  estimating  relation¬ 
ships.  There  are  numerous  times  when,  because  of  data  limitations,  he 
must  instead  rely  on  mechanical  curve  fitting  and  the  development  of 
empirical  equations. 

The  material  presented  here  is  intended  to  provide  the  practicing 
cost  analyst  with  a  basic  knowledge  of  the  mechanics  of  curve  fitting 
and  of  the  properties  of  the  equations  he  uses.  For  the  most  part,  the 
choice  of  material  reflects  an  attempt  to  answer  those  questions  which 
years  of  experience  have  shown  to  be  most  common  and  troublesome. 

Similar  information  can  be  found  in  other  sources,  but  to  the  best 
of  the  author's  knowledge  does  not  exist  in  any  single  source.  The 
integration  of  analytic  geometry  with  curve-fitting  methods  and  the 
selection  of  the  material  itself  are  the  unique  features  of  this  pre¬ 
sentation.  The  mathematical  discussions  are  purposely  intuitive.  They 
are  intended  to  be  understandable  by  persons  having,  at  best,  limited 
mathematical  training. 

Special-purpose  functional  forms  and  curve-fitting  methods  have 
not  been  included.  Only  those  forms  and  methods  that  have  already 
proven  to  be  of  general  use  to  the  cost  analyst  are  described  here. 
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Although  the  computational  schemes  presented  in  the  Memorandum 
are  for  the  most  part  suitable  for  the  desk  calculator,  high-speed 
digital  computers  are  widely  available  today,  and  should  be  used  when 
possible. 
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SUMMARY 

Much  of  the  difficulty  cost  analysts  have  with  curve  fitting  re¬ 
sults  from  an  inadequate  grounding  in  the  analytic  geometry  of  the  em¬ 
pirical  equations  with  which  they  work.  This  Memorandum  attempts  to 
provide  a  concise  but  relatively  thorough  discussion  of  this  subject 
while  at  the  same  time  demonstrating  selected  methods  for  mechan¬ 

ical  curve  fitting. 

The  material  is  presented  in  three  parts.  Section  I  discusses 
the  properties  of  the  straight  line,  the  exponential,  the  power  func¬ 
tion,  and  the  parabola.  Included  in  the  discussion  of  the  exponential 
are  the  laws  of  exponents  and  hence  logarithms.  Emphasis  is  on  pro¬ 
viding  insights  into  the  impact  of  the  parameter  values  on  the  form  of 
the  resultant  curves.  Graphical  illustrations  are  used  extensively. 

Section  II  presents  fferent  methods  of  using  these  curves  to  de¬ 
scribe  the  relationship  between  two  variables.  It  discusses  the  method 
of  selected  points,  the  method  of  averages,  and  the  method  of  least 
squares,  making  considerable  use  of  scatter  diagrams.  It  describes  a 
number  of  measures  of  goodness  of  fit  including  the  standard  deviation, 
the  coefficient  of  variation,  and  an  average  percent  deviation.  Through¬ 
out  this  section,  computational  procedures  are  carried  out  in  complete 
detail . 

The  discussion  of  curve  fitting  is  continued  in  Section  III,  where 
cases  with  more  than  two  variables  are  considered.  By  using  the  method 
of  successive  approximations,  the  initial  discussion  attempts  to  con¬ 
vey  the  idea  of  a  nec  relationship  between  two  variables,  eliminating 
influence  of  any  others,  and  thus  to  clarify  the  meaning  of  the  coeffi¬ 
cients  in  the  multivariate  linear  equation.  The  method  of  least  squares 
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is  shown  to  produce  the  same  results  as  did  the  method  of  successive 
approximations  and  with  significantly  less  computat ioral  effort.  The 
discussion  turns  next  to  the  nonlinear  case.  Each  of  the  functional 
forms  described  earlier — the  exponential,  the  power  function,  and  the 
parabola — is  used  to  describe  a  nonlinear  relationship  between  three 
variables.  Although  the  method  of  successive  approximations  may  be 
used  for  fitting  '■vrvr:  *’o  nonlinear  relationships,  only  the  method 
of  least  squares  is  described. 

The  decision  to  discuss  the  analytics  and  the  curve-fitting  methods 
in  separate  sections  of  the  Memorandum  was  purely  arbitrary.  For  many 
purposes,  the  user  of  this  material  will  want  to  combine  his  readings 
in  the  first  section  with  his  readings  in  subsequent  sections.  The 
parallel  nature  of  the  presentations  in  each  section  was  designed  to 


facilitate  this. 
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I.  CURVE  FITTING  AND  EMPIRICAL  EQUATIONS 


INTRODUCTION 

An  effective  cost  analysis  capability  cannot  exist  without  the 
systematic  collection  and  analysis  of  data  on  past,  present,  and  pro¬ 
jected  programs.  The  analysis  must  result  in  the  development  of  esti¬ 
mating  relationships  which  can  be  used  as  a  basis  for  estimating  the 
resource  impact  of  future  proposals.  These  relationships  typically 
relate  resource  requirements  to  the  physical,  performance,  or  opera¬ 
tional  characteristics  of  the  system,  and  are,  in  essence,  formal 
statements  of  the  way  one  or  more  variables  relate  to  each  other.  At 
times  a  simple  factor  such  as  cost  h  r  mile  is  all  that  is  needed. 
Frequently,  however,  a  more  complex  relationship  is  required  such  as 
the  one  between  the  weight  and  cost  of  an  aircraft  and  the  manhours 
required  to  maintain  it.  The  necessary  relationships  are  usually  ex¬ 
pressed  as  mathematical  equations,  as  curves  drawn  on  coordinate  paper, 
or  both.  In  either  case,  the  methods  of  curve  fitting  are  essential 
to  the  development  process. 

Just  what  do  we  mean  by  curve  fitting?  Suppose  we  plot  a  set  of 
corresponding  values  of  two  variables  on  coordinate  paper.  The  prob¬ 
lem  of  curve  fitting  is  that  of  finding  the  equation  of  a  curve  that 
passes  through  (or  near)  these  points  in  the  graph  so  as  to  indicate 
their  general  trend.  An  equation  determined  in  this  way  is  called  an 
empirical  equation  between  two  variables,  and  the  process  of  finding 
it  is  called  curve  fitting.  While  curve  fitting  as  such. deals 
primarily  with  relationships  between  two  variables,  certain  of  the 


basic  methods  can  be  used  to  establish  empirical  equations  among 
three  or  more  variables.  In  cost  analysis  it  is  often  useful  to  show 
the  relationship  between  two  variables  in  the  form  of  a  curve  even 
when  there  is  no  mathematical  expression  possible;  the  methods  of 
curve  fitting  can  be  used  to  establish  such  curves. 

One  of  the  interesting  things  about  curve  fitting  is  that  there 
are  so  many  different  ways  to  do  it,  and  a  review  of  the  literature 
on  the  subject  would  lead  one  to  believe  that  there  are  as  many 
methods  as  there  are  authors  and  problems.  In  fact,  we  may  reason¬ 
ably  conclude  that  curve  fitting  is  more  of  an  art  than  a  science. 
Fortunately,  however,  many  of  the  methods  are  useful  in  solving 
only  a  limited  number  of  unique  problems,  and  for  that  reason  are  not 
of  interest  to  us  here.  It  is  the  intent  of  this  Memorandum  to  ex¬ 
plain  curve  fitting  ir.  a  general  sense  and  to  present  only  those 
methods  that  experience  has  shown  to  be  of  general  use  to  the  cost 
analyst . 

SOME  BASIC  ANALYTIC  GEOMETRY 

This  section  disc  "sse  the  mathematical  properties  of  some  func¬ 
tional  forms,  the  general  shape  of  t^e  curves  portrayed  by  each,  and 
the  relationship  between  the  shape  of  the  curve,  its  location,  and 
the  values  of  the  equation  constants.  Since  our  greatest  concern  at 
this  point  is  to  develop  the  equation  describing  a  particular  rela¬ 
tionship,  we  will  present  the  techniques  for  calculating  the  equation 
parameters,  both  constants  and  coefficients. 
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The  different  functional  forms  that  will  be  treated  are  summarized 
below: 


y  =  a  +  bx, 
y  =  a  +  bx  +  ax? 

y  =  *bx, 

b 

y  =  ax  , 


straight  line, 
parabola, 
exponential , 
power  function. 


These  suffice  for  most  cost  analysis  problems  with  two  variables.  Although 
there  are  other  forms  that  are  frequently  used,  they  generally  can  be  re¬ 
lated  to  those  given  above  through  appropriate  scale  transformations. 


Straight  Line 

The  straight  line  is  certainly  the  simplest  functional  form  to 
deal  with.  It  is  completely  defined  by  knowing  any  two  points  on 
the  line.  The  main  feature  of  any  straight  line  is  the  slope  or 
tilt  of  the  line.  If  the  line  rises  reading  from  left  to  right  as 
in  Fig.  1,  from  points  x^y ^  to  P (x,y),  the  slope  is  said  to  be  positive; 
if  the  line  falls  reading  from  left  to  right,  the  slope  is  said  to  be 
negative.  The  actual  value  of  the  slope  (b)  is  the  ratio  of  a  change 
in  y  to  a  related  change  in  x,  that  is,  the  ratio  of  the  length  of 
the  vertical  dashed  line  to  the  length  of  the  horizontal  dashed  line 
in  Fig.  1. 

If  we  are  given  any  point  on  a  line,  say  P(x,y)  as  in  Fig.  1,  and 
the  slope  (b) ,  we  would  be  able  to  deduce  the  equation  of  the  line. 

This  may  be  expressed  symbolically  as 

(y  - 


b 


(x  -  xL)’ 
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Fig.  1 — Straight  line  with  known  slope 
drawn  through  point  x^  ^ . 


or 


(y  -  yj  =  b(x  -  xx), 


which  is  known  as  the  Point-Slope  form  of  the  equation  of  a  straight 
line.  If  we  are  given  one  point  and  the  slope,  we  would  immediately 
be  able  to  substitute  in  the  above  and  have  the  equation  of  the  line 
A  slight  modification  of  this  results  when,  as  in  Fig.  2,  the 
slope  is  not  known  directly,  but  two  points  on  the  line  are  given. 
Then  the  slope  can  be  calculated  as  follows: 

(y  2  -  y^ 


and  this  value  may  in  turn  be  substituted  into  the  Point-Slope  formula. 


This  particular  form  is  probably  used  more  than  anv  tit  her  in  f  i  1 1  ;  ng 
s C  ra i ght  1 ines . 

1'here  are  other  instances  when,  as  in  fiy.  1,  the  slope  ami  the 
intercept  are  known.  The  intercept,  more  properly  called  the  inter 
cept,  is  the  point  where  the  line  crosses  the  .  axis.  This  point  is 
identified  in  Kin.  3  as  P(L),.).  Because  the  coordinates  of  the  inter 
cept  are  as  useful  as  the  coord  inat.es  of  anv  other  point,  uv  mav  use 


them  t o  write  the  equation  ot  the  line. 


ihe  Point-Slope  formula  t  > '  r 


a  straight  line  is  used  and  the  result  is 

(y  -  a)  =  b(x  -  0) 

which  simplifies  to  y  *  a  +  bx  where  both  the  intercept  (a)  and  the 
slope  ( b )  are  immediately  recognizable .  This  is  known  as  the  Slope- 
Intercept  formula  for  the  equation  of  a  straight  line. 


y 


Fig.  3 — Straight  line  uith  y  intercepting  at  a 
and  passing  through  point  x.,:«\, 

a. 

The  remaining  case  is  where  both  the  x  and  the  j  intercepts  are 
known.  As  in  Fig.  s ,  a  is  the  value  of  y  when  x  is  equal  to  0,  and  c 
is  the  value  of  x  when  is  equal  to  0.  Notice  that,  in  this  case, 
the  lire  slopes  downward  from  left  to  right  so  that  we  would  expect 
the  slope  to  be  negative.  Writing  the  equation  for  calculating  the 
Blope  using  the  modified  Point-Slope  formula,  »e  have 


\ 

Fig.  h — Straight  line  showing 
,r  and  intercepts 

Notice  that  the  y  Intel oept  is  equal  to  a  and  the  slope  is  equal 
to  a  divided  by  i*.  This  form  of  the  formula  for  a  straight  line  is 
called  the  Intercept  form.  Another  form  of  this  equation  which  re¬ 
sults  from  a  slight  rearrangement  is 


1 . 


In  this  form  the  coordinates  of  the  tvc  intercepts  are  immediately 
recognizable. 

The  next  figure.  Fig.  5,  shows  two  special  cases  of  the  straight 


line,  the  line  parallel  to  the  jt  axis  and  the  line  parallel  to  the 


f  ixed 


is  at  the  origin  of  the  coordinate  axes  and  the  line  of  symmetry  of 
the  curve  is  the  x  axis.  Referring  to  the  definition,  we  find  that 
when  the  value  of  y  is  0  (which  is  the  case  at  the  origin),  the  di¬ 
rectrix  is  the  same  distance  to  the  left  of  the  origin  as  the  focus 
is  to  the  right.  Further,  assuming  the  distance  from  the  directrix 
to  the  focus  on  the  line  of  symmetry  is  p,  the  coordinates  of  the  focus 
are  by  definition  (p/2,0),  and  similarly  the  equation  of  the  directrix  is 
x  *  “p/2, 

Letting  P  be  any  point  on  the  parabola  and  setting  the  distances 
FP  and  PL  equal  to  each  other,  according  to  the  definition,  we  can 
derive  the  equation  for  parabolas  symmetrical  to  the  x  axis,  the 


p (x,y) 


!  v  - 


4,.-. 


Fig.  7 — Parabola  with  vertex  at  the  origin, 
opening  to  the  right 


•;>-  *■ 
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vertex  at  the  origin,  and  opening  outward  to  the  right.  Using  the 

standard  distance  formula  for  determining  the  length  of  the  two 

* 

lines  FP  and  PL,  and  setting  one  equal  to  the  other,  we  obtain 


FP  = 


+ 


(x  -  p/2)2, 


PL  =  (x  +  p/2)\ 


(y  -  or  +  (x  -  p/2 


7.  / 


(x  +  p/2)' 


When  both  sides  of  this  expression  are  squared  and  the  result  expanded, 
we  have 


+  x 


2 

x 


+  px 


2 

+  E~ 

4  » 


* 

The  distance  between  any  two  points  on  rectangular  coordinates 
may  be  calculated  by  using  the  following  formula: 


d  - 


+  (x1 


where  d  ~  the  required  distance, 

(x^.pj)  =  the  coordinates  of  the 
first  point. 


y 


X 


(x^,y *  tne  coordinates  of  the 
second  point. 


I 

-12-  I 

which  simplifies  to 

2  o 

y  “  2px» 

and  is  the  equation  of  the  parabola  shown  in  Fig.  7.  If  the  parabola 
pictured  in  Fig.  8,  the  equation  could  be  obtained  by  using 
the  same  method: 


which  is  the  equation  for  a  parabola  with  its  vertex  at  the  origin, 
symmetrical  to  the  y  axis,  and  opening  upward.  Notice  that  the  ninety- 
degree  rotation,  as  was  made  between  Fig.  7  and  Fig.  8,  caused  the  x 
and  the  u  terms  to  be  interchanged.  Otherwise  the  two  equations  are 
identical . 

When  the  vertex  of  the  parabola  is  shifted  away  from  the  origin, 
as  in  Fig.  9,  the  equation  will  again  be  altered.  To  show  how,  we 
regard  the  problem  as  one  of  shifting  the  intersection  of  the  axis  of 
the  coordinate  system  from  the  point  (h,k)  to  the  point  (0,0).  To 
make  the  translation  we  set 

x  x'  +  h  or  x'  =  x  -  h, 
y  -  y'  +  k  or  y'  -  y  -  k. 


* 


Fig.  8 — Parabola  with  vertex  at  the  origin,  symmetrical 
fn  f*hp  nnpninc  imuard 
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where  x  and  y  refer  to  the  original  axes;  x'  and  y'  refer  to  the  axes 
whose  center  coincides  with  the  vertex  of  the  parabola;  and  h  and  k 
are  the  coordinates  of  the  origin  of  the  x' ,  y '  axes  measured  from 
the  xfy  axes. 

When  we  substitute  (x  -  h)  for  x'  and  (y  -  k)  for  y'  in  the 
equation  y ' 2  =  2 px' ,  we  have 

(y  -  k)2  »  2 p(x  -  h), 

which  when  expanded  yields 

y2  -  2 ky  +  k2  =  2 px  -  2 ph, 
y2  -  2 px  -  2 ky  +  2 ph  +  k2  -  0. 

Because  in  all  cases  h,  k,  and  p  will  be  constants,  the  equation  may 
be  written  as  follows: 

2 

y  +  Dx  +  Ey  +  F  =  0, 

where  D  =  -2 p, 

E  =  -2k , 

F  =  2ph  +  k2. 

This  is  the  standard  form  of  the  equation  for  all  parabolas  symmetrical 

to  a  line  parallel  to  the  x  axis. 

2  2 
If  instead  ot  y'  m  2 px'  we  had  started  with  x'  =  2 py’ ,  we 

would  arrive  at  the  standard  form  of  the  equation  as  follows:  Sub¬ 
stituting  (y  -  k)  and  Cx  -  h )  for  y'  and  x'  respectively  gives  us 

(x  -  h)2  -  2p(y  -  k), 
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which  when  expanded  yields 

x 2  •  2 xh  +  -  2py  +  2 pk  =  0. 

After  substituting  as  above,  we  have 

x  2  +  Dy  +  Er  +  f'  =  0 , 

where  D  =  -2 p, 

E'  =  -2  h, 

F'  =  2pfe  +  fc2. 

This  is  the  standard  form  of  the  equation  for  all  parabolas  symmetrical 
to  a  line  parallel  to  the  y  axis. 

If  we  take  each  of  the  two  standard  forms  in  turn,  shift  the 
terms  and  divide  through  appropriately,  we  arrive  at  the  following 
equations : 


E 


1  2  E*  F* 
D*  +  D*  +  D  = 


As  each  of  the  coefficients  in  the  above  expressions  is  a  constant, 
we  can  make  further  substitutions  and  obtain  either 

2 

Ay  +  By  +  C  =  x, 
or 

2 

Ax  +  j Hx  +  C  m  y  t 
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wher  e  A  =  1/1), 

B  =  K/I)  or  K'/D, 

C  =  F/D  or  F'/D. 

Hu* si*  are  Che  forms  of  the  parabola  that  are  most  commonly  used 
in  curve  fitting.  Since  there  are  three  coefficients,  or  unknowns, 
at  least  three  points  must  be  known  to  define  the  curve.  Given  three 
points  on  a  parabola,  the  equation  may  be  obtained  by  using  the  coordi¬ 
nates  of  each  of  the  points  to  obtain  an  equation  of  the  above  form 
and  then  solving  the  three  equations  simultaneously  for  .4,  B,  and  '. 

To  illustrate,  we  are  given  the  three  points  (0,2),  0,4),  and 
(4,12).  Plotting  these  points  as  in  Fig.  10  leads  us  to  believe  a 
parabola  opening  upward  and  symmetrical  to  a  line  parallel  to  the  / 
axis  would  be  the  correct  form  to  fit.  The  standard  form  of  the 
equation  for  this  type  of  parabola  is 

i 

.  =  +  !\r  +  '. 


*  (4,12) 


•f  i ,  4 ) 


2  L 


0 

1 

__J _ 1 _ 1 

0  1 

2  i  4 

jv. .  in--points  used  to  illustrate  fitting  the  parabola 


If 


i 
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Substituting  each  of  the  three  points  in  this  expression  allows  us  to 
write  the  following  three  equations.  Notice  that  the  x  coordinate 
must  be  squared  to  make  certain  of  the  substitutions: 

2  =  OA  +  OB  +  C , 

4  =  97!  +  3B  +  C, 

12  =  16/1  +4  B  +  C. 

It  is  obvious  from  the  first  equation  that  C  is  equal  to  2.  Using 
this  knowledge  to  adjust  the  two  remaining  equations  will  reduce  the 
problem  significantly.  In  this  case,  the.  two  remaining  equations 
are 


2  =  9  A  +  'IB, 

10  =  164  +  4 P. 

There  are  a  number  of  ways  to  sowe  simultaneous  equations. 
Probably  the  simplest  for  onlv  two  equations  is  the  determinant  meth¬ 
od.  As  the  number  of  variables  and  the  number  of  equations  get  larger, 
however,  other  mechods  are  preferred.  In  fact,  when  four  or  more  equa¬ 
tions  are  involved,  it  is  probably  best  to  look  for  computer  programs 
to  do  the  job.  The  determinant  method  is  particularly  well  adapted 
to  the  desk  calculator,  but  not  particularly  well  suited  for  illus¬ 
trative  purposes.  Here  we  will  divide  by  the  leading  coefficients 
and  eliminate  variables  by  subtraction. 

Dividing  the  first  <_quution  by  9,  the  second  by  16,  and  subtract¬ 
ing  the  first  equation  from  the  second,  A  is  eliminated.  These  steps 


follow: 
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and 


2 

9’ 


10 

lb‘ 


Subtracting  the  first  from  the  second,  w  t 

(z  -  i)B '  (!  -  ?)• 

The  necessary  simplifications  and  other  arithmetic  having  been  performed, 


We  next  substitute  the  value  of  B  into  the  first  of  the  two  variable 
equations  and  calculate  A  as  follows: 


The  required  coefficients  are  now  seen  to  be 


It  is  usually  gooc’  practice  to  substitute  all  of  these  coefficients 


into  one  of  the  original  equations  to  test  the  correctness  of  the 
arithmetic.  Substituting  in  the  second  equation  we  have 


The  required  arithmetic  shows  us  that  the  values  of  the  coefficients 
calculated  are  in  fact  correct.  The  equation  that  we  have  been  looking 
for  is  therefore 


Solving  this  equation  for  given  a  range  of  values  of  jr  and  plotting 
them,  allows  us  to  draw  t he  curve  shown  in  Fig.  11.  Contrary  to  our 
expectation,  this  form  of  parabola  is  not  a  good  representation  of  the 
relationship  implied  by  the  three  points.  This  example  illustrates 
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an  inherent  difficulty  associated  with  using  the  parabola.  If  we  had 
not  examined  the  charac ter ist ics  of  this  curve  over  the  relevant  values 
of  j,  the  fact  that  .  is  negative  between  j-  =  l'j  and  .r  =  2  would  not 
have  been  noticed  and  could  have  led  to  absurd  cost  estimates. 

in  curve  fitting  we  are  concerned  primarily  with  the  best  repre¬ 
sentation  of  the  data  at  hand.  In  cost  estimating  we  are  typically 
concerned  with  extrapolation  beyond  the  range  of  the  existing  data. 

When  we  choose  a  parabola  to  represent  a  relationship  between  two  sets 
of  data,  we  generally  use  onlv  a  limited  segment  of  the  entire  curve. 
Figure  12  illustrates  how  this  fact  can  lead  to  trouble  The  boxed- in 
segmerts  of  the  curve  show  the  part  of  the  curve  used  to  describe  the 
data.  Examination  of  the  curves  outside  the  limits  of  the  various 
boxes  shown  in  Figs.  12a,  12b,  and  12c  indicates  the  kind  of  trouble 
one  can  get  into  by  using  this  type  of  curve  for  making  excrapolat ions 
There  are  times  when  the  best  parabolic  function  to  represent  a 
s e l  of  d  ..a  is  of  the  form 

.4:,“  +  r:.  +  «  x 

Since  it  is  conventional  for  to  be  the  dependent  variable,  this 
equation  causes  some  difficulty.  One  way  this  difficulty  can  be 
overcome,  after  fitting  the  curve,  is  to  solve  the  resultant  equation 
for  ,  using  the  quadratic  formula.  first  the  equation  must  in  re¬ 
written  as  tollows : 

-•  .*  +  .  ♦  (,  -  _r'  >  o. 
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(12c)  Another  parabola  opening  downward 
Then ,  using  the  quadratic  formula, 

n  =  -B  i  V -  4 (A) (C-j) 

y  2a 

Use  of  this  formula  will  probably  result  in  two  solutions  because  the 
square  root  of  a  numoer  can  be  either  positive  or  negat  ve.  Each 
equation  must  be  evaluated  to  deter,.,!.?  which  is  appropriate. 

Exponent  ia  1 

X  * 

The  general  form  of  the  exponential  equation  is  y  =  ah  .  Graphs 
of  two  exponential  equations  that  differ  from  each  other  or-ly  with 
respect  to  the  value  of  />  ar*.  'hown  in  Fig.  Hi.  In  each  case  a  hat, 
been  set  equal  to  1.  As  will  be  shown,  only  the  level  of  the  expo¬ 
nential  is  affected  by  the  value  of  a. 

A  graph  similar  to  that  shown  in  Fig.  1  la  results  wherever  h  i. 
greater  than  1,  and  a  graph  similar  to  that  in  Fig.  13b  if  is  between 

* 

In  this  text,  the  function  with  the  independent  variable  x  as  the 
exponent  is  called  the  exponential,  while  the  function  y  =  ax^\  where 
the  exponent  is  a  constant,  is  called  the  power  function. 


1  and  0.  If  ■'  is  equal  to  1  the  exponential  equation  becomes 

..  =1, 

for  1  raised  to  any  power  is  equal  to  1.  If  :  is  0  there  is  no  equa 
tion,  for  0  raised  to  any  power  is  0,  and  conseauently 

.  =0. 

When  '  is  negative  (less  than  0) .  t!  ■  exponential  is  discontinuous 
and,  for  that  reason,  of  no  value  to  us  for  curve  fitting. 


3a)  Exponential  with  b  >  1  ( lib)  Exoonential  with  0  <  h  * 


Fig.  13 — Negative  and  positive  exponential  curves 


For  our  purposes,  the  fact  that  the  exponential  curve  rises  from 
eft  to  right  when  b  is  greater  than  i  and  from  right  to  left  when  b 


is  between  1  and  0  is  the  relevant  characteristic. 


Also  notice  that 


both  curves  pass  through  the  point  /  =  1,  x  =  0. 

The  influence  of  a  is  illustrated  in  Fig.  14  Larger  values  tend 
to  raise  the  curve  while  lower  values  cause  a  downward  shift.  When  x 
is  equal  to  0,  b®  =  1,  and  the  exponential  becomes 


a. 


Consequently,  a  may  be  thought  of  as  the  u  intercept. 


(14a)  Exponential  with  b  >  1  (14b)  Exponential  with  0  <  b  < 

Fig.  14 — The  effect  of  the  values  assigned  to  a  on 
the  level  of  the  exponential 

Since  facility  with  the  exponential  requires  an  understanding  of 
exponents  and  logarithms,  we  will  digress  temporarily  to  review  these 
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topics.  The  system  of  exponents  is  based  entirely  on  five  basic  laws 
and  four  definitions.  The  first  definition  states  that  the  expression 
■t  ,  where  >:  is  an  exponent  and  a  is  greater  than  0,  is  the  product  of 
a  multiplied  'v  itself  n  times: 

2 

a  =  a  x  a, 

3 

a  =  a  x  a  x  a, 

etc  . 

*r  Y\  rrj+1 

The  first  law  of  exponents  states  that  the  product  of  a  and  a  is  a 
which  incidentally  follows  directly  from  the  initial  definition.  To 
illustrate : 

232+35 

a  *  a  =  a  =  a  , 

2  3 

a  -  a  >•  a,  a  =  a  *  a  x  a , 

( a  x  a) (a  *a  *  a)  =  (a  xaxaxax  a)  =  c? . 


Each  of  the  other  laws  can  be  similarly  derived  and  it  would  be  a 
worthwhile  exercise  for  the  reader  to  do  so.  All  five  laws  are  sum- 
... arized  below: 


I. 

n  n 

a  x  a 

rr+u 

=  a 

II. 

rv  .  n 
a  la 

xl 

! 

£ 

II 

III. 

icTf 

.77  XV 

=  a 

IV. 

(ab)H 

n  ,  n 

=  a  x  p 

V . 

(a/b)n 

ii 

i 
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Three  additional  definitions  complete  the  system;  a  is  defined  as  1, 

—  y\  y\  "k  \  /n 

a  is  defined  as  1 /a  ,  and  a  is  defined  as  the  nth  root  of  a.  The 
root  is  positive  if  a  is  positive,  and  negative  if  a  is  negative  and 
n  is  odd.  This  system  not  only  gives  meaning  to  the  expression 
when  a  is  greater  than  0  and  x  is  any  rational  number,  but  also  provides 
the  inputs  essential  to  a  discussion  of  logarithms. 

The  logarithm  of  a  number  is  the  power  to  which  a  base  number  must 
be  raised  to  equal  the  original  number;  it  can  be  more  conveniently 
expressed  as 

x 

u  =  a  , 

where  x  is  the  logarithm  of  u  to  the  base  a.  In  the  language  of  loga¬ 
rithms  we  would  write 


x  -  log  y  • 


The  logarithm  .r  is  also  an  exponent.  From  this  and  our  earlier 
discussion  of  exponents,  we  conclude,  and  rightly  so,  that  any  rational 
number  greater  than  0  can  be  the  base  of  a  system  of  logarithms.  In 
actual  practice,  however,  10  and  the  constant  e  are  most  commonly 
used.  When  the  base  is  10,  the  logarithms  are  called  common  (logs) 


* 

It  should  also  be  pointed  out  that  such  definitions  seem  logical 
from  the  law  of  division.  That  is. 


1 


n 

a 


a 


0 

a 


and 


=  a 


0-71 


-n 


a 


**  1  jv 

The  constant  e  is  the  limit  of  the  expression  (1  +  y)  as  y 

approaches  0;  the  limit  is  equal  to  2.7183  to  five  significant  figures. 

It  is  one  of  the  most  important  limits  in  calculus. 


-2.7- 


logarithms,  and  when  the  base  is  e  they  are  railed  natural  or 

Napierian  logarithms.  We  shall  follow  the  general  practice  of  using 
the  abbreviation  log  where  10  is  the  base  and  In  where  e  is  the  base. 
Tables  of  each  are  readily  available. 

Any  rational  number  greater  than  0  can  be  expressed  in  terms  of 
its  logarithm  and  consequently  in  terms  of  10  or  e.  Expressing  a 
relationship  in  terms  of  e  leads  to  simplification  both  of  form  and 
of  required  computations.  Suppose,  for  example,  we  have  a  number, 
y ,  which  we  wish  to  express  in  terms  of  e.  We  would  only  have  to  find 
In  y  in  a  table  of  natural  logarithms  to  write 

In  ly  =  x. 


or  in  exponential  form, 


x 

=  e  . 


Figure  15(a)  shows  us  that  these  two  equations  have  exactly  the  same 
graph  as  do  the  equations 


In  x  =  a  , 

and 

X  =  e'  , 

Fig.  15(b)  provides  similar  information  for  reciprocal  relationships. 

Interchanging  the  x  and  u  terms  does,  however,  cause  an  exchange 
of  coordinate  axes. 

To  express  the  exponential  y  -  16.5'^  in  terms  of  e  we  treat  the 
number  16.5  as  e^  and  write 
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and 


16.5  = 


In  16.5  =  k. 


From  a  table  of  natural  logarithms  we  find  that  In  16.5  is  approxinatel v 
2.83  and  we  write  either 


or 


in  16.5  =  2.83, 


16.5 


2.83 

e 


Substituting  in  the  original  exponential  equation  and  applying  the 
third  law  of  exponents  we  obtain 


b  =  (< 


.83yr 


and 


U 


2.83.r 

e 


When  the  exponential  is  expressed  in  terms  of  e,  the  slope  of  the  curve 
at  any  point  is  equal  to  the  value  of  the  expression  at  that  point. 

When  the  exponential  is  not  expressed  as  a  function  of  e,  the  slope 
is  proportional  to,  but  not  equal  to,  the  value  of  the  expression  at 
the  point;  i.e.,  slope  =  ku . 

For  example,  Fig.  16  shows  the  graph  of  the  expression  =  2(2"  1 
which  is  an  exponential  of  the  form  ;<  =  "  .  Since  this  is  not  writ¬ 

ten  in  terms  of  e,  we  would  expect  the  slope  at  any  point  to  be  pro¬ 
portional  to  the  value  of  the  expression  at  that  point.  We  can  check 
this — at  least  approximately--by  estimating  the  slope  of  the  curve 


at  two  points  and  comparing  trip  results  with  the  value  of  the  function 


at  those  points.  To  do  this  we  establish  the  equation 


where  =  the  estimated  slope  at  point  „r , 

r.  =  the  value  of  the  function  =  2(2“  )  at  point  x, 
=  the  constant  of  proportionality. 


y 


i  o  i 


Fig.  lh--Kst  imating  the  slope  of  the  expression 
■  2(2)“  at  points  P.  and  P 

Two  points,  !’j  and  P)t  have  been  selected,  one  at  either  end  of 
the  curve.  It  is  obvious  that  the  slopes  at  these  two  points  are  dif¬ 
ferent.  Let  us  assume  that  the  curve  extended  an  'qual  distance  from 
each  P  in  either  direction  (shown  on  the  graph  as  the  hvpotenuse  of  tin. 
indicated  traingles)  is  a  straight  line.  Remembering  the  discussion  of 
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the  straight  line,  we  see  that,  having  made  this  assumption,  the  coor¬ 
dinates  of  the  vertexes  of  the  triangles  provide  sufficient  information 
to  estimate  the  slope  of  the  curve.  The  coordinates  of  tht.  vertexes 
of  the  upper  triangle  are 

=  2. 00,  ?  ,  =  8.00, 

x.  =  1 . hO ,  . !  =  h.Ohl.* 

The  formula  for  calculating  the  slope  of  a  line  given  two  points  is 


■_\ 

"l 

and  on  substitution 


g  2-  8 . 000  -  6 . 0 h  3 

'  ’  2.  on  1  .  hf)  ’ 

S  -  •  8  1  i. 

The  x  coordinate  of  point  P^  is  the  average  of  g' ^  and  r  ,  (1.80). 
Where  this  point,  x^,  is  substituted  in  the  expression  y  =  2(2  3),  the 
resulting  value  of  y  is  ty.^b  4.  Returning  to  our  proport  iona  1  i  tv  state¬ 
ment 


we  substitute  appropriately  and  get 


•1  ,  8  •<  ' 


h  .  , 


=  i'.toJS, 


k 

The  y  coordinates  were  obtained  hv  substituting  the  x  coordinates 
in  the  expression  =  2(2“*);  values  to  three  decim.il  places  were  ob¬ 
tained  by  solving,  a  procedure  that  improves  the  agreement  between  the 
estimated  slopes. 


H 
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indicutlng  that  the  slope  of  the  curve  can  be  evaluated  at  any  point 
by  multiplying  the  value  of  the  function  at  that  point  by  0.695.  To 
check  this  we  use  point  in  exactly  the  way  we  did  and  derive 
another  estimate  of  the  slope  and  the  value  of  k. 

For  the  lower  triangle 

x.,  *  -0.4,  y 2  *  1.516, 

Xj  53  -0. 8,  y .  -  1 . 149; 

therefore  the  slope 

-  c-  1.516  -  1.149 
-0.4  +  0.8  ’ 

5  a  0.918. 

The  value  of  the  function  at  x^f-0.6)  is  1.320;  therefore 

') .  9 1 8  =  1.320^, 
k  =  0.695; 

and  we  are  satisfied  that  the  required  constant  of  proportionality  does 

exist.  The  value  of  the  expression  at  each  of  the  two  points  was  calcu¬ 

lated  to  four  significant  figures  using  the  expression  itself.  Values 

were  not  read  from  the  curve.  It  is  also  interesting  to  note  that  for 

X 

the  illustrative  expression  of  the  form  y  ■  ab  ,  the  valup  of  b  v/as 
Further,  the  natural  logarithm  of  2  is  equal  to  0.6931  which  is  quite 
close  to  the  value  of  estimated  above.  The  fact  that  our  results 


were  no  closer  to  the  theoretical  value  of  c  is  due  largely  to  the 
assumption  that  the  curve  was  linear  over  the  relevant  range. 
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The  constant  of  proportional  tty  can  be  proved  analytically  to  be 
exactly  equal  to  the  natural  logarithm  of  b.  Further,  in  an  exponential 
of  the  form  y  ■  e  ,  b  is  equal  to  e  and  the  natural  logarithm  of  e  is 
equal  to  1.  Thus  for  these  kinds  of  exponential  expressions  it  is  ob¬ 
vious  that  k  must  also  be  equal  to  1. 

Let  us  turn  now  from  the  digression  to  our  main  discussion.  It 
has  probably  already  been  noticed  that  the  exponential  expressed  in 
logarithmic  form  is  a  straight  line  function:  i.e.,  the  expression 
■  .-tb  is  equivalent  to  the  expression  In  ■>  ■  In  a  +  In  bx\  notice 
In  a  and  In  i  are  constants.  This  fact  greatly  facilitates  fitting 
with  exponential  expressions. 

If  we  are  given  the  exponential  y  ■  eu+^X  and  we  want  to  put  it 

into  logarithmic  form,  we  take  the  logs  of  each  side.  Although  the 

ct+bx 

logarithm  of  y  presents  no  problem,  the  logarithmic  form  of  e  may 
appear  to.  When  we  remember  the  laws  of  exponents  and  the  fact  that 
logarithms  arc  in  fact  exponents,  we  find  that  a  +  bx  is  the  logarithm 
of  y  to  the  base  e,  and  as  such  becomes  the  natural  form  of  the  right- 
hand  side  of  the  equation 


In  y  ■»  a  +  bx. 

The  exponential  expression  is  therefore  linear  when  stated  in 
terms  of  the  logs  of  one  of  its  members.  To  illustrate  this,  take 
the  following  expression 


In  u  m  a  +  bx. 

We  can  recognize  this  as  a  semi-log  straight  line.  If  we  were  to 


convert  to  exponential  form  we  would  have 


If  we  also  examine  the  equation 


we  find  that  this  is  another  form  of  the  exponential  which,  can  he  con¬ 
verted  to  log  form  as  follows:  First,  we  divide  both  sides  of  the 
equation  by  ;  which  gives  us 


U’e  can  also  express  as  a  pv...  r  of  e.  We  Look  up  the  natural  Log 
of  :  ,  which,  for  lack  of  a  better  name,  we  will  call  .•>.  We  can  now 


•  x+2 

../w  =  (e  r  , 


or,  again  according  to  the  third  law  of  exponents, 


'  /  J  =  e 


The  expression  may  be  further  simplified  by  letting  2v  be  represented 
by  the  constant,  This  produces 


u  hi  =  e 


Now  converting  to  logarithmic  form  we  have 


In  (;,/  j)  =  ,vr  +  , 
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which  once  again  is  a  linear  expression  when  the  quotient  of  y/a  is 
given  in  terms  of  logarithms. 

The  equation  of  an  exponential  passing  through  two  points  may  be 
determined  quite  simply.  We  need  only  set  up  the  required  functional 
form  in  terms  of  logarithms,  then  proceed  with  a  straight  line. 

The  following  example  illustrates  this  method. 

Given  the  two  points,  1,7  and  4,1  on  x,,.  coordinates,  we  choose 
an  expression  of  the  form  y  =  ab as  the  appropriate  general  exponential 
to  fit.  The  next  step  is  to  restate  this  expression  in  terms  of  loga¬ 
rithms  as  follows: 


In  y  =  In  a  +  x  In  b . 

With  the  cooru mates  of  the  two  points,  (x^  ,y  1  )  and  (x7,yr))  we 
may  write  the  two  equations: 

In  u ,  =  In  a  +  x.  In  f, 

J  i  1 

In  i/ 2  =  1°  a  +  ^2 

Taking  the  logarithms  of  y  j  and  y  ^  ant^  substituting  the  logs  nf 
the  us  and  the  xs  in  the  above  equations  results  in  two  equations  with 
two  unknowns  that  may  be  solved  simultaneously: 

1.9459  =  In  a  +  1  In  b, 

0.0000  =  In  a  +  4  In  b . 

Sub  "acting  the  second  equation  from  the  first  leaves 

1.9459  =»  -3  In  bt 


In  b  =»  -0 . 6486 . 
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This  value  can  ::.n  turn  be  substituted  into  the  first  equation  above 
with  ‘he  result  that 


1.9459  =  In  a  -  0.6486, 

In  a  =  2.5945, 

and  with  this  we  can  write  the  required  expression  as  follows: 

In  y  =  2.59  -  0.649x. 

The  expression  has  been  evaluated,  and  the  results  plotted  in  Fig.  17 
pass  exactly  through  the  two  pr-'nts  as  required.  We  can  simplify  the 
expression  by  converting  it  to  exponential  form.  To  do  this  we  must 
have  the  numbers  represented  by  In  a  and  In  l  .  Looking  in  a  table  of 
natural  logarithms  we  find  that 

In  a  »  2.59  =  In  13.4, 

In  b  =  -0.649  =  In  0.523, 

and  we  may  write 

y  =  13.4(0.523)*. 

Notice  that  13.4,  the  value  of  a  in  this  expression,  is  in  fact  the 
y  intercept.  We  can  simplify  still  further  by  converting  the  expres- 

p 

sion  to  one  in  terms  of  e.  To  do  this  we  thin1-  of  13.4  as  e  and 
0.523  as  eS ;  thus 

13.4  =  er, 

0.523  =  eF , 


(4,1) 


Fig.  17--Expo«ential  fitted  through  lwo  points 


and  we  find  the  value  of  both  v  and  s  by  again  consulting  a  table  of 
natural  logarithms.  Another  way  to  write  the  last  two  expressions  is 


and 


r  *  In  13.4 


0  *  In  0.523. 


-38- 


We  really  did  not  need  to  use  the  table  again  because  the  Ins  of  these 
values  are  already  available  from  our  calculations  above: 

s  =  In  0.523  =  -0.649, 
r*  =  In  13.4  =  2.59. 


T*  S  X 

Now  if  we  substitute  e  and  e"  in  the  equation  y  =  13  4(0  r?3)  ,  we 

have 


y  = 


5..  -0.649,x 
)(e  )  . 


Using  the  first  and  third  laws  of  exponents  we  convert  this  expres¬ 
sion  to 

(2.59  -  0.649X) 

y  *  e 


Rewriting  this  expression  in  logarithmic  form  results  m 


In  y  =  2.59  -  0.649x, 

that  is,  th"  same  expression  that  we  had  initially. 

Power  Function 

The  power  function  is  one  of  the  most  commonly  used  mathematical 
expressions  in  cost  analysis  because  in  many  cases  it  adequately 
describes  the  phenomenon  of  decreasing  costs  of  successive  units  of 
production.  The  general  equation  of  this  function  is 

b 

y  *=  ax  . 

To  avoid  confusing  the  power  function  with  the  exponential,  which 
looks  somewhat  similar,  we  must  observe  the  placement  of  the  variable, 
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x.  In  the  power  function,  the  variable  x  is  raised  to  the  power  b, 
while  in  the  exponential,  a  constant  is  raised  to  the  variable  power 
x  as  below: 


The  characteristics  of  the  power  function  can  best  be  illustrated 
by  initially  setting  a  equal  to  1,  because  a  affects  primarily  the  level 
of  the  curve  and  has  little  influence  on  its  shape.  Having  done  this 
we  are  left  with  the  equation 

b 

y  ~  x  * 

For  certain  values  of  b,  the  function  is  not  continuous  for  negative 
values  of  x.  Therefore,  we  will  restrict  the  variable  x  to  values 
greater  than  0;  the  exponent  b  can  assume  any  value,  positive,  nega¬ 
tive,  or  0.  However,  when  b  is  0,  the  equation  becomes 

y  = 

for  any  value  raised  to  the  0  power  is  equal  to  1.  When  b  is  positive 
and  varies  from  0,  the  family  of  curves  shown  in  Fig.  18  results. 

The  smallest  value  assigned  to  b  in  Fig.  18  is  0.2.  Had  0.0  been 

used,  the  result  would  have  been  a  straight  line  parallel  to  the  x 
axis  and  passing  through  y  =  1  as  above.  When  b  is  between  0  and  1 

the  curves  generated  are  concrv-'  downward.  When  b  -  1  a  straight  line 

(y  ~  x)  results,  because  any  number  raised  to  the  first  power  is  the 
number  itself.  As  values  greater  than  l  are  assigned  to  b,  the  curves 
become  concave  upwards.  The  curves  pass  through  the  point  x  =  1, 
u  =  1  for  all  values  of  b,  because  1  raised  to  any  power  always  equals  1. 
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When  x  is  greater  than  1  the  curves  rotate  upwards  as  the  value 
of  /  increases.  When  x  is  between  1  and  0,  however,  the  situation  is 
not  the  same,  as  shown  by  Fig.  19.  At  the  upper  end  all  of  the  curves 
go  through  the  point  .7-  =  1,  ,  =  1  as  before,  but  at  the  lower  end  they 
all  tend  towards  the  point  x  =  0,  =0.  When  !  is  made  smaller  the 

curves  become  higher,  the  reverse  of  what  happened  when  x  was  greater 
than  1.  When  h  is  greater  than  1  the  curves  are  concave  upwards;  when 
b  is  Jess  chan  1  the  curves  are  concave  downwards.  As  before,  when 

■k 

}  is  equal  to  1  the  curve  is  the  straight  line  =  .r. 

'  .en  the  exponent  b  is  negative,  the  family  of  curves  shown  in 
Fig.  20  is  generated.  Regardless  of  the  value  of  b  when  it  is  negative, 
the  curves  are  concave  upwards.  As  before,  however,  each  of  the  curves 
passes  through  the  point  x  =  1 ,  y  =  1 ,  and  for  values  of  x  greater 
than  1  the  curves  wi.h  the  lower  values  of  b  lie  above  those  with  higher 
values  of  b.  When  x  is  less  than  1,  however,  the  reverse  is  true. 

When  b  is  equal  to  -  1  we  do  not  have  a  straight  line,  as  was  the  case 
when  b  was  equal  to  +1;  in  this  case  the  resulting  equation  is 

l 

b  -  x. 

which  is  n  reciprocal  or  a  form  of  hyperbola. 

Figure  21  illustrates  the  effect  of  including  ;  in  the  equation. 
When  i  is  increased  from  1  the  curves  shift  upwards  bv  direct  multipli¬ 
cation.  When  a  is  r  duced  from  1  to  0  the  curves  shitt  similarly  hut 
in  a  downward  direction. 

* 

This  is  true  because  a  decimal  raised  to  a  power  greater  than 
1  gives  a  smaller  number.  41so,  a  decimal  raised  to  a  positive  power 
less  than  >  gives  a  larger  number  than  itself. 


X 


with  negative  exponent  and 


In  most  cost  analvsis  applications  only  that  part  of  the  curve 
where  x  is  ■  1  is  of  interest.  Since  we  are  generally  concerned  with 
depicting  a  cost-quantity  relationship  with  .  the  cost  and  x  the  quan¬ 
tity,  we  are  not  concerned  with  the  cost  of  lens  than  one  unit.  But 
both  because  the  curve  might  be  useful  for  other  applications  and 
because  of  its  special  behavior,  we  should  become  familiar  with  its 

■k 

more  general  characteristics. 

The  power  function  also  is  linear  when  expressed,  in  terms  of 
logarithms.  Returning  to  the  general  equation 

• 4  * **-  « 

and  taking  the  logarithms  of  both  sides,  we  have 

log  y  =  log  +  ■'  log  ", 

which  is  quite  clearly  a  linear  expression  in  logarithms.  As  most 
curve-fitting  techniques  are  simpler  when  handling  linear  relation¬ 
ships,  it  is  common  to  make  this  transf ormat ion  before  fitting  the 
power  function.  Figure  22  shows  the  complete  family  of  p  >ver  func¬ 
tions  plotted  using  logarithmic  coordinate?. 

When  two  lines  (power  t  unctions'  are  p  irailel  on  logar  i  t  hm.ic 
coordinates,  they  differ  from  each  other  by  a  constant  ratio  o;  per¬ 
centage,  which,  is  contrary  to  the  cc.se  viuTe  two  parallel  I  i  n<.  s  on 
arithmetic  coordinates  differ  bv  a  constant  number.  To  demonstrate 
analytically,  assume  that  we  have  two  curves,  one  ra'  percent  higher 
than  the  other.  The  equations  for  these  curv-  ore 

It  should  also  he  pointed  out  that  this  curve  is  undefined  f.r 
a  negative  r  and  x  «  t'.  this  case  goes  to  infinite  in  a  positive 

direct  Ion. 
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In  the  second  equation  let  the  constant  terms  log  1.5  +  log  a  be  set 
equal  to  log  a'.  We  then  have 

log  u .  =  1  a  +  b  log  x , 
log  y  =  log  a'  +  b  log  x. 

The  only  difference  between  the  two  occurs  in  the  constant  terms  log 
a  and  log  a' .  When  these  two  equations  are  plotted  on  logarithmic  grids 
there  will  be  two  parallel  lines  with  the  spacing  between  them  equal  to 
log  a '  -  .log  a  or  log  1.5. 

Fitting  the  power  function  through  two  points  can  be  accomplished 

by  transforming  both  variables  into  logarithms  and  proceeding  as  with  a 

linear  case.  ’"o  review  the  method,  see  the  discussion  of  the  straight 
* 

line . 


* 
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I  (  .  KITTING  CL'RVKS  TO  TW<  '-VARIABLE  RELATIONSHIPS 

THE  STRAIGHT  LINE 

Although  there  are  many  methods  of  fitting  straight  lines,  three 
are  usually  sufficient  for  fitting  curves  to  two-variable  relationships 
the  method  of  selected  points,  the  method  of'averages,  and  the  method 
of  least  squares.  Of  course  one  can  always  draw  the  curves  freehand, 
but  even  so  the  equation  of  the  line  must  be  determined  by  using  one 
of  the  other  methods. 

The  Method  of  Selected  Points 

When  it  is  apparent  that  data  plotted  on  rectangular  coordinates 
can  be  described  by  a  straight  line,  the  equation  of  that  line  can  be 
found  using  the  method  of  selected  points.  With  the  use  of  a  straight 
edge,  a  line  is  drawn  through  the  points  such  that  the  points  are  uni¬ 
formly  distributed  around  the  line.  Two  points  are  then  read  from  the 
line  near  the  extremities  and  substituted  in  the  equation 

u  -  a  +  bx. 

The  two  equations  are  then  solved  simultaneously  for  a  and  b. 

In  the  example  shown  in  Fig.  23  the  two  points  selected  were 
Pj  (4,  8.3)  and  P  (26,  24).  The  two  equations  were  therefore 

8.3  -  a  +4 b, 

24.0  =  a  +  26/> . 

When  the  firrt  is  subtracted  from  the  second,  the  result  is 


Fig.  23 — Straight  line  fitted  usi  the  method  of  selected  points 

When  /  is  substituted  in  the  first  equation 

8.3  =  a  +  (4)(0.714)  , 
a  =  5.44. 

The  equation  of  the  desired  line  is 

y  =  5.44  +  ().714x. 


When  values  of  x  taken  from  the  original  data  are  substituted 
in  this  equation,  values  of  y  corresponding  to  each  of  the  original 


values  can  be  calculated.  One  way  of  showing  how  well  the  line  fits 


the  data  is  to  compare  the  calculated  values  with  the  original  values 

using  a  percent  deviation  for  each  data  point  as  in  Table  1 .  When 

computing  the  percent  deviation,  it  is  usual  to  base  the  percentages 

* 

on  the  observed  values.  For  example,  the  percent  deviation  for  the 


first  data  point  is 


(8.00  -  6.87)~~  =  14.1. 


Table  1 

USING  THE  METHOD  OF  SELECTED  POINTS  TO 
FIT  A  STRAIGHT  LINE:  DATA  AND  RESULTS 


b 

X 

y (calc. ) 

Percent  Deviation 

. 

8 

2 

6.87 

14.1 

6 

4 

8.30 

-38.3 

12 

6 

9.72 

19.0 

10 

10 

12.58 

-25.8 

14 

12 

14.01 

-0. 1 

18 

1 6 

16.36 

6.3 

20 

27 

21.15 

-5.8 

28 

24 

22.58 

19.4 

26 

26 

24.00 

7.7 

22 

30 

2b. 8G 

-22.1 

15.9  av 

After  the  percent  deviations  are  calculated  for  each  data  point, 
an  average  percent  deviation  can  be  calculated  by  adding  each  devia¬ 
tion,  disregarding  the  sign,  and  dividing  the  total  by  the  number  of 
data  points. 

Notice  that  the  placement  of  the  line  in  the  example  was  quite 
arbitrary;  this  is  one  of  the  weaknesses  of  the  method  of  selected 


Percent  deviation 


y  observed  -  y  calculated 
y  observed 


100. 
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points.  The  same  equation  could  have  been  arrived  at  in  a  slightly 
different  manner.  Had  the  line  been  extended  to  the  left  until  it 
intercepted  the  (  axis,  we  could  have  read  the  value  of  O.  directly 
from  the  graph  and  calculated  the  slope  as  follows; 

t  =  (24  -  8 . 3) / (26  -  4)  =  0.714. 

Still  another  method  using  the  two  points  given  and  the  modified  point 
slope  formula  would  be 


u  =  3.44  +  0 . 7 !  4,r . 

The  Method  of  Averages 

To  use  the  method  of  averages  the  data  must  first  be  arrayed  in 
ascending  order  according  to  one  of  the  variables.  Second,  the  numbers 
are  divided  so  that  two  approximately  equal  groups  are  formed.  If  the 
number  of  data  points  is  even,  there  should  be  an  equal  number  in  each; 
if  odd,  the  extra  point  will  have  to  be  placed  in  one  group  or  the 
other.  The  average  value  of  each  of  the  variables  is  calculated  tor 
each  group  and  substituted  into  the  equation 

u  =  :  +  I'jr. 

Two  equations  result  as  before;  these  are  solved  simultaneously  for 
a  and  h , 


In  the  example  shown  in  Fig.  24  and  Table  2,  the  data  are  ordered 
according  to  the  variable  x.  As  there  are  an  even  nunber  of  data 
points  (10),  5  are  to  be  assigned  to  each  group.  When  average  values 
for  both  x  and  y  are  calculated  for  each  group,  they  define  the  two 
points  (6.8,  10)  and  23.6,  22.8)  which  are  then  plotted  and  a  straight 
line  connecting  them  drawn.  The  equation  of  the  line  is  obtained  by 
substituting  the  coordinates  of  the  average  points  into  the  equation 
y  =  a  +  hx  as  before  and  solving  the  two  equations  simultaneously  for 


a  and  b. 


10  =  a  +  6.8/, 


22.8  =  a  +  23.6/' 


9  (2  3.6,  22.8) 


'Curve  drawn  hroi.gh  Pj  and 
u  «  4.82  +  0 . 7  6  2x 


Pj  (6.8,  10) 


Fig-  24--Straight  line  fitted  using  the  method  of  averages 


■ 


When  Lie  first  is  subtracted  from  the  second,  tlie  result  f  s 

12.3  =  16.86, 
h  =  0.762; 

and  substituting  h  in  the  first  equation, 

10  -  a  +  (6.8X0.762) , 
a  =  4.82. 


Therefore , 

y  =  4.82  +  0. 762r . 

Percent  and  average  percent  deviations  as  calculated  are  shown 
in  Table  2. 

Table  2 

USING  THE  METHOD  OF  AVERAGES  TO  FIT  A 
STRAIGHT  LINE:  DATA  AND  RESULTS 


u 

X 

L_  _  J 

-  -•  '  .  ■  ■■ 

v  (calc . ) 

Percent  Deviation 

8 

2  ! 

6.3  . 

20.7 

b 

4 

7.87 

-31.2 

12 

1 

b  j 

9.39 

21.7 

12.44 

13.96 


1 

10  av  j 

6.8  av 

— 

-- 

f 

1 

. . . . . . 

18  j 

16 

17.01 

1 

5.  5 

20 

22 

21.58 

-7.9 

i 

28 

24 

23.  1 1 

17.5 

26 

26 

24.63 

5.3 

22 

30 

F  ■■■-• 

27.68 

-25.8 

22.8  av 

2  3.6  av 

"" 

16.0  av 
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A  sl^ghtlv  different  version  of  the  method  ot  averages  is  some¬ 
times  used.  Instead  of  calculating  average  values  as  was  done  before, 
the  data  are  used  to  establish  ten  separate  linear  equations  as  follows 


8 

• 

a 

+ 

2  b 

18  = 

a 

+ 

16/> 

fl 

a 

+ 

4  b 

20  = 

a 

+ 

22  b 

12 

a 

+ 

6  b 

28  = 

a 

+ 

24/> 

10 

X 

a 

+ 

10b 

26  = 

a 

+ 

26  b 

14 

= 

a 

+ 

12/ 

22  = 

a 

+ 

30/' 

50 

5  a 

+ 

84/' 

114  = 

5c 

+ 

1 1 8 /■ 

These  groupings  are  preserved  and  the  equations  appt aring  in  each 
group  are  added  to  obtain  the  required  two  equations  which  are  solved 
simultaneously  for  a  and  b : 


50  =  5;  +  14/  , 

1U  -  5a  +  118b. 

Subtracting  the  first  from  the  second, 

6*  «  84/  , 
b  -  0.762; 

and  substituting  ;  in  the  first  equation 

50  -  5.1  +  (VO  (0.7621, 
:  -  4.82. 


The  same  solution  results  from  either  averaging  or  adding.  Although 
the  method  of  averages  is  simple  to  usr  and  does  give  a  reproducible 


solut  in 


i  J'>es  not  ensure  that,  a  lent  fitting  straight  line  will 


be  chosen. 

Ihe  JT'AIIL  _'h^uui_res 

The  method  of  least  squares  is  probable  the  most  widely  used 
method  of  obtaining  ermirira!  "qua  t  ions .  squares" 

reflects  the  criterion  used  to  determine  the  desired  equation.  The 
line  is  chosen  such  that  the  sum  of  the  squared  deviations  of  the 

A 

points  from  the  ’ ine  is  minimized.  The  way  this  criterion  is  used 
in  the  formula  for  calculating  a  least -squares  line  is  worked  out  in 
full  in  Appendix  A.  Brief  lv  it  is  as  follows: 

We  are  seeking  an  equation  of  a  straight  line 


such  that  the  sum  of  t'  e  squared  distances  of  the  data  from  tha  lir.e 
will  he  minimal. 

Ihe  sum  of  the  squared  dev iat ions  t  '  is  expressed  mu: homat tea  1  1 
a  s 

■'*'  * 

fo  obtain  values  ot  :  and.  :  so  that  this  expression  eun  in imi zed  , 

it  is  necessary  to  take  partial  derivatives  with  resjv  t  to  both 

A  * 

and  '  and  to  equate  them  to  u.  Fh.es  e  pa  ;i..l  derivatives  are 


•  ,  *  :x :  +  2:  s  *  o  and 

*  '  * 

~  T 

w  ‘,re  the  equation  is  of  the  form  v  -  a  +  tx ,  the  distances  are 
measured  parallel  to  the  u  axis.  tonverselv,  where  the  equa t i on  is  o 

the  form  x  •  a  +  ty ,  the  distances  are  measured  parallel  to  'he  x  axi 

*  * 

1 h i s  is  a  method  of  calculus  included  here  onlv  as  a  matter  of 
interest.  Its  comprehend  ion  Is  not  essential  to  anvth’ng  that  follow 
in  this  text . 
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ib 


x,  +  2a  x  +  2 1  ■  x  =  0; 


ana  with  the  obvious  simplif ications  made,  the  resultant  two  equations 


'  u  =  M  f-  ;  x% 


XU 


X  + 


are  called  the  normal  equations  for  fitting  a  least-squares  straight 
line.  Each  of  the  values  other  than  a  and  b  can  readily  be  determined 
from  the  data,  leaving  two  equations  with  two  unknowns  that  can  be 
solved  simultaneously. 

In  applying  this  method  it  is  convenient  to  array  the  data  as  in 
the  first  two  columns  of  Table  3.  (The  ordering  of  the  values  is  not 
essential,  although  it  tends  to  make  checking  the  calculations  easier.) 
The  next  step  is  to  square  each  entry  in  the  column  headed  x  and  enter 


Table  3 

USING  THE  METHOD  OF  LEAST  SQUARES  TO  FIT 
A  STRAIGHT  LINE:  WORKSHEET 


y 

- , 

X 

X 

xy 

y (calc . ) 

d 

Percent 

Deviation 

2 

d 

8 

2 

4 

16 

7.07 

0.93 

11.7 

0.865 

6 

4 

16 

24 

8.49 

-2.49 

-41 .  o 

6.200 

12 

6 

36 

72 

9.90 

2.10 

17.5 

4.410 

10 

10 

100 

100 

12.73 

-2.73 

-27.3 

7.453 

14 

12 

\t ' 

168 

14.14 

-0.14 

-1.0 

0 .020 

18 

16 

25b 

288 

16.97 

1.03 

5.7 

1.061 

20 

22 

484 

440 

2’  .21 

-1.21 

-6. 1 

1.464 

28 

24 

576 

672 

22.63 

5.37 

19.2 

28.837 

26 

26 

676 

676 

24.04 

1.96 

7.5 

3.842 

22 

30 

900 

660 

26.87 

-4.87 

-22.1 

23.717 

164 

132 

3, 192 

3,116 

— 

— 

16.0  av . 

77.869 
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the  product  in  the  column  headed  x  .  The  entries  in  the  column  headed 
xu  are  the  products  of  the  X  and  j  values  for  each  point.  After  these 
calculations  have  been  made,  each  column  ;s  totaled  as  shown.  .V  is 
the  number  of  data  points;  the  remaining  values  equired  bv  the  normal 
equations  are  the  totals  of  the  appropriate  columns.  When  these  are 
substituted  in  the  normal  equations,  we  have 

164  =  10.-  +  132-  , 

3116  =  152x  +  3192;  , 

and  when  these  are  solved  simultaneously, 

r  =  3.66 
b  =  0.707. 

The  equation  is  therefore 

y  =  5.66  +  0. 707x. 

Figure  25  shows  the  data  and  the  straight  line  described  bv  this 
equation.  The  average  percent  deviation  is  calculated  as  before.  An¬ 
other  measure  of  goodness  of  fit  that  is  often  used,  particularly  in 
conjunction  with  least-squares,  is  the  standard  error  of  the  estimate 

k 

of  y  ,  5.  This  measure  is  obtained  by  squaring  each  of  the  deviations, 
adding  the  results,  dividing  the  total  uy  N  (the  number  of  data  points), 
and  taking  the  square  root  of  the  ;  suit,  e.g., 


The  standard  error  of  the  estimate  of  y  allows  a  heavier  penalty 
for  extreme  data  points  than  does  the  average  percent  deviation  method, 
yet  it  is  more  difficult  to  interpret. 
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Fig.  25--Straight  line  fitted  using  the  method  of  least  squares 
In  our  example 

„  _  +  /77T869 
8  ’ 

.S'  =  ±3.120. 


Since  the  standard  error  of  the  estimate  of  y  is  expressed  in  the 
same  units  as  the  variable  y,  it  is  often  better,  when  making  comparisons, 
to  convert  to  a  non-dimensional  number.  This  is  usually  done  by  dividing 
S  by  the  arithmetic  mea..  of  y,  y: 

V  -  (S/p) 100, 


or 


3.12 


► 


V 


16.4 


x  100; 
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:  =  19=02%, 

where  is  called  the  coefficient  of  variation. 

Summary 

Of  the  three  methods  for  fitting  straight  lines  the  method  of 
selected  points  is  the  easiest  to  use,  hut  when  the  required  line  is 
not  obvious  from  the  data,  the  choice  is  strictly  a  matter  of  judg¬ 
ment  and  the  results  are  not  easily  reproducible.  The  method  of  aver¬ 
ages  provides  a  relatively  simple  and  yet  definitive  way  of  choosing 
an  equation  (although  not  necessarily  the  best  fitting  line).  The 
only  place  where  uncertainty  enters  the  picture  is  in  the  placement 
of  the  odd  data  point.  The  method  of  least  squares  requires  more  in 
the  way  of  calculation  but  provides  a  completely  unambiguous  way  of 
selecting  and  fitting  the  line;  for  that  and  other  reasons  it  is  the 
most  widely  used  method. 

An  average  percent  deviation  can  be  used  to  show  how  well  the  line 
fits  the  data.  It  is  easy  to  calculate  and  to  interpret.  The  standard 
error  of  the  estimate  of  y,  while  requiring  more  calculations  to  be  made, 
is  more  widely  used  because  of  its  implications  in  drawing  statistical 
infeiences.  The.  coefficient  of  variation,  a  non-dimensional  term  based 
on  the  standard  etror,  is  useful  especially  for  making  comparisons. 

Although  the  results  of  applying  the  three  different  methods  to 
the  same  data,  as  displayed  in  Fig.  26,  are  strikingly  similar,  the 
user  cannot  expect  that  this  would  always  be  the  case. 

THE  PARABOLA 

The  same  three  methods  used  to  fit  the  straight  line  may  be  used 
to  fit  the  parabola.  To  lit  the  parabola,  both  the  method  of  selected 


te 


points  and  the  method  of  averages  rely  even  more  on  the  judgment  of 
the  cost  analyst  than  was  the  case  with  respect  to  the  straight  line. 
For  this  and  other  reasons,  the  method  of  least  squares  is  preferred. 
Measuring  goodness  of  fit  is  the  same  as  it  was  for  the  straight  line 
and  will  not  be  discussed  again.  There  are  two  forms  of  the  parabola 
which  may  be  used  by  the  cost  analyst.  The  first  is 

y  =  a  +  bx  +  sxL , 


which  is  illustrated  in  Fig.  27.  The  second  is 

2 

x  =  a  +  by  +  ay  , 


or,  in  terms  of  y, 


which  is  illustrated  In  Fig.  28. 


Fig.  27 — Parabola  form  1 
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Fig.  28 — Parabola  form  2 

In  the  first  case,  the  line  of  symmetry  of  the  parabola  is  parallel 
to  the  y  axis  and  in  the  second  it  is  parallel  to  the  x  axis.  Each  form 
will  be  examined  separately.  Form  2  requires  y  to  be  the  squared  term. 
In  order  to  accomplish  this,  the  equation  of  the  first  form  is  written 
interchanging  y  and  x  as  follows: 

2 

x  =  a  +  by  +  ay  ; 

and  solving  for  y  using  the  quadratic  formula 

2 

ay  +  by  +  a  -  •  =  0, 

-b  t  -  4 a  {a  -  x) 

y  « - 2c? - • 

Parabola  Form  I 

The  Method  of  Selected  Points.  When  it  is  clear  that  the  rela¬ 
tionship  described  by  the  data  can  be  represented  by  a  segment  of  a 


parabola,  the  method  of  selected  points  may  be  used.  First,  draw  a 
freehand  curve  roughly  the  shape  of  a  parabola  through  the  data,  and 
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read  three  points  from  the  curve,  two  at  the  extremities  and  one  some-  ! 

] 

where  in  the  middle.  If  there  is  a  relatively  sharp  maximum  or  mini-  ! 

mum  point  (vertex),  it  would  be  best  to  read  the  third  point  from  the 
area  of  the  vertex.  Then  substitute  three  points  in  the  equation 

2 

u  =  a  +  bx  +  ox  , 

and  solve  the  resulting  three  equations  simultaneously  for  a,  b,  ar.d 
a.  Notice  that  x  appears  twice  in  each  equation,  once  as  it  is  read 
from  the  curve  and  once  squared. 

In  the  example  shown  in  Fig.  29,  numbers  for  which  are  given  in  Table 
4,  we  draw  the  freehand  curve  indicated  by  the  solid  line  and  read  from 


Table  4 

USING  THE  METHOD  OF  SELECTED  POINTS  TO  FIT  THE 
PARABOLA  FORM  1 :  DATA  AND  RESULTS 


y 

|  X 

iy  (calc) 

Percent 

Deviation 

3 

1 

2.00 

-33.3 

2 

3 

2.99 

4 

4 

3.56 

5 

8 

6.37 

27.4 

'] 

10 

8.08 

15.4 

10 

11 

9.01 

-9.9 

12 

14 

12.13 

1  . 1 

15 

17 

15.70 

4.7 

18 

18 

-5.6 

17 

19 

7.9 

— 

-- 

16.o  av 

the  curve  the  points  (18,  17),  (12,  10)  and  (1,  2).  The  required 
three  equations  are 


To  eliminate  /> ,  we  multiply  die  first  equation  by  11  and  the 


second  by  6,  then  subtract  the  second  from  the  first: 


29  =  1  122f, 
t*  =  0.0259. 


Substituting  c*  in  the  first  equation. 


7  =  6/'  +  (180)  (0.0259) , 
b  =  0.391 . 


Next,  we  substitute  both  b  and  ■  in  the  third  of  the  original 


■k 

equations,  and  calculate  ,7  as  follows: 


2  =  2  +  1  + 

a  -  2  -  0.391  -  0.0259, 
a  »  1 . 582. 


The  required  equation  is  therefore 


-  1.582  + 


0.  39 lx  +  0.0259j*~ 


The  graph  c‘  this  equation  is  shown  hv  the  dashed  line  in  tig.  29. 

As  with  the  straight  line,  the  chances  are  small  that  two  analysts 
independently  using  this  method  would  arrive  at  the  same  result.  The 
only  argument  in  favor  ot  it  is  that  It  is  relatively  simple  to  use. 

The  Method  of  Averages .  Three  points  similar  tv'  those  obtained 
in  the  method  of  selected  points  by  arbitrarily  choosing  them  from  a 


* 

The  discrepancies  in  the  arithmetic  result  from  the*  fact  that 
more  decimal  places  than  those  shown  were  used  in  making  the  actual 
calculat ions . 
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f re eh and  curve  may  be  obtained  by  averaging.  In  this  method,  we  array 
the  data  In  ascending  order  of  one  of  the  variables,  form  three  groups 
of  approximately  equal  size,  and  calculate  the  average  values  of  both  x 
and  .  for  each  group.  We  substitute  the  average  points  In  the  equation 

2 

■  i  +  hx  +  vr*", 

and  solve  the  resulting  three  equations  simultaneously  for  a,  b,  and  c  . 

Table  5  illustrates  the  procedure  for  the  example  case.  The  10 
numbers  are  arrayed  in  ascending  order  according  to  the  value  of  x. 
Three  groups  are  forme',  by  assigning  3  points  to  the  first  and  last 
groups  and  4  points  to  the  middle  one. 

Table  5 


USING  THE  METHOD  OK  AVERAGES  TO  FIT  THE 
PARABOLA  FORM  l:  WORKSHEET 


Percent 

> 

X 

■j  (calc) 

Deviation 

3 

1 

2.34 

22.0 

2 

3 

3.15 

-57.5 

4 

4 

3.64 

9.0 

3  av 

2.67  av 

— 

— 

5 

8 

6.20 

-24.0 

7 

10 

7.83 

-11.9 

10 

l  11 

8.73 

-12.7 

12 

!  14 

11.78 

1.8 

8.5  av 

10.75  av 

— 

— 

15 

17 

15.36 

-2.4 

18 

18 

16.67 

7.4 

17 

19 

18.04 

-6.1 

lb. 6 7  av 

18  av 

— 

15.5  av 

(The  assignment  of  the  odd  point  is  arbitrary.)  By  averaging,  we  obtain 
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3.00  =  2  +  2 .677  +  7.129;*, 
8.50  -  a  +  10.75/  +  1  15.563c, 
16.67  =  a  +  18.007  +  324.000c. 


When  these  equations  are  solved  simultaneously,  the  result  is 

a  «  2.018, 
b  =  0,290, 


c  =  0.029! , 


The  equation  of  the  requir  i  parabola  is  therefore 
y  =  2.018  +  0 . 290r  +  0.0291/r2. 

This  method  is  less  arbitrary  than  the  method  of  selected  points, 
but  some  ambiguity  does  exist  because  of  having  to  assign  any  odd  data 
point  to  one  of  the  three  groups.  Further,  averaging  may  prevent  us 
from  c!  osing  a  point  near  the  vertex  of  the  curve. 

The  Method  of  Least  Squares,  lo  fit  a  parabola  using  the  method 
of  least  squares,  we  must  solve  three  normal  equations  simultaneously 
for  the  values  of  the  coefficients  a,  b,  and  c  in  the  equation 

y  =  a  +  bx  -1-  ox2. 


The  normal  equations  are 

..  r  r  2 

l  y  =  Na  +  h  l  x  +  c  I  x  , 

£  xy  =  a  l  x  +  /■  l  x2  +  a  T  x\ 

r  2  r  2  ,  r  3  r  4 

/  x  y  *  a  >  x  +  r  l  x  +  c  )  x  . 


I 
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The  least-squares  critericn  and  details  of  the  derivation  of  these 
equations,  which  are  described  fully  in  the  Appendix,  are  similar  to 
those  for  the  straight  line.  As  in  previous  discussions,  ail  cf  the  in¬ 
formation  required  to  solve  the  equations  can  be  obtained  from  the  data. 

In  a  worksheet  similar  to  the  one  shown  in  Table  6,  we  array  the 
data  as  in  the  first  two  columns,  making  entries  in  the  other  columns 
after  performing  tne  cal  ulations  indicated  by  the  headings.  ,7  is  the 


Table  6 

USING  THE  METHOD  OF  LEAST  SQUARES  TO  FIT 
THE  PARABOLA  FORM  1 :  WORKSHEET 


y 

a 

3 

X 

: 

4 

X 

i —  ■  ■  — i 

I 

xy 

2 

x  y 

u  (calc) 

Percent 

Deviation 

3 

i 

1 

1 

i 

3 

3 

2.2? 

24.0 

2 

3 

9 

27 

81 

6 

18 

3.04 

-52.0 

4 

4 

16 

64 

256 

16 

64 

3.51 

12.0 

5 

8 

64 

512 

4,096 

40 

320 

6.03 

-20.6 

7 

10 

100 

1,000 

10,000 

70 

700 

7.66 

-9.4 

11 

121 

1,331 

14,641 

110 

1,210 

8.58 

14.2 

12 

14 

196 

2,744 

38,416 

168 

2,352 

11.69 

2.6 

15 

17 

289 

4,913 

83,521 

255 

4 , 335 

15.37 

-2.5 

18 

18 

324 

5,832 

104,976 

32A 

5,832 

16.72 

7.1 

17 

19 

361 

6,859 

130,321 

323 

6,137 

18.13 

-6.6 

93 

105 

386 , 309 

J 

1 

1 

i 

_ 

15.1  av 

number  of  data  points  and  the  column  totals  provide  the  other  neces¬ 
sary  inputs  to  the  normal  equations.  The  equations  to  be  solved  for 
the  examp] e  shown  in  Fig.  31  and  Table  6  are: 

93  -  10a  +  105Q  +  1481a, 


1315  -  105a  +  14812?  +  23283a, 


•71- 


Thus  the  desired  equation  is 

=  1.993  t-  0.254a:  +  0.0314jj2. 

The  method  of  selected  points,  the  method  of  averages,  and  the 
method  of  least  squares  can  each  be  used  to  fit  a  parabola. 
because  the  method  of  least  squares  results  in  an  unambiguous  solution, 
it  is  usually  preferred. 

Figure  32  shows  a  comparison  of  the  results  obtained  by  using 
each  of  these  methods  to  fit  a  parabola  to  the  example  data. 


Fig.  32 — Fitting  a  parabola  form  1  using  three  alternate  methods 


Parabola  Form  2 

The  equation  for  this  class  of  parabolas  is 

x  *  a  +  by  +  ay 2 . 

The  only  difference  between  this  and  the  equation  for  the  parabola 
form  1  is  that  X  and  y  have  been  interchanged.  In  fitting  this  farm 
we  invert  the  relationship  between  x  and  y  in  the  data  and  proceed  as 
before  until  a,  b,  and  a  have  been  calculated.  At  that  point  the  equa¬ 
tion  in  y  as  above  must  be  solved  for  y  using  the  quadratic  formula. 

The  desired  result  will  typically  be  one  of  the  two  possible  solutions; 
the  appropriate  one  can  best  be  determined  by  experimentation.  While 
all  three  curve  fitting  methods  can  be  used  here  also,  we  shall  only 
illustrate  the  method  of  least  squares. 

The  Method  of  Least  Squares.  The  normal  equat-*ons  necessary  to 
fit  this  kind  of  parabola  are  the  same  as  for  form  1  but  with  x  and  y 
interchanged  as 

l  x  -  A'u  +  b  l  y  +  l  y2 , 

]xy*aly+bly2  +  cly3, 
l  xy2-ali,'  +  bly3  +  clyl*. 

When  the  appropriate  values  are  calculated  as  shown  in  Table  7 
and  substituted  in  the  above  equations,  we  have 

92  -  10a  +  110 b  +  1 530;? , 

1370  -  110a  +  1 5306  +  23690a, 


22076  -  1530a  +  236906  +  387858 a. 
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Table  7 


USING  THE  METHOD  OF  LEAST  SQUARES  TO  FIT 
THE  PARABOLA  FORM  2:  WORKSHEET 


•J 

X 

2 

If 

3 

If 

.  4 

If 

XU 

2 

XU 

u  (calc) 

Percent 

Deviation 

2 

i 

1 

4 

8 

16 

2 

4 

0.76 

62.0 

4 

2 

16 

64 

256 

8 

32 

3.99 

0.3 

6 

3 

36 

216 

1,296 

18 

108 

5.80 

3.3 

8 

5 

64 

512 

4,096 

40 

320 

8.40 

-5.0 

10 

9 

100 

1,000 

10,000 

90 

900 

12.13 

-21.3 

12 

6 

144 

1,728 

20,736 

72 

864 

9.46 

21.2 

15 

12 

225 

3,375 

50,625 

180 

2,700 

14.33 

4.5 

16 

16 

256 

4,096 

65,536 

256 

4,096 

16.84 

-5.3 

10 

18 

324 

5,832 

104,976 

324 

5,832 

17.97 

0.2 

19 

20 

361 

6,859 

130,321 

380 

7,220 

19.04 

-0.2 

110 

92 

1,530 

23,690 

387.858 

1,370 

22,076 

— 

12.3  av 

The  solution  is 


a  «  0.913, 
b  =  0.0783, 
<•  =  0.0485; 


and  the  equation  sought  is 

x  =>  0.913  +  0.0783m  +  0.0485y2. 


Since  our  objective  is  to  use  x  to  estimate  ,  we  must  solve  this 
equation  for  :< .  We  can  do  this  by  writing  it  as  ..  quadratic  equation 


in  :/ : 
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1 

i 


0,0485 y 


2 


+  0,0783 y  +  (0.913  -  X ) 


=  0, 


and  using  the  quadratic  formula  we  obtain 


-0.0783  ±  /0.07832  -  4(0.0485) (0.913  -  x) 
2(0.0485) 


which  simplifies  to 


-0.0783  ±  /-0 .171  4-  0 . 1 94# 
0.0971 


We  can  see  on  inspection  that  the  solution  in  which  the  square-root 
term  is  negative  is  not  useful;  the  correct  equation  is 


-0.0783  +  /-0.171  +  Q.194j 
y  “  0.0971 

This  equation  has  been  graphed  in  Fig.  33.  The  reason  for  selecting  the 

form  2  parabola  is  that  as  larger  and  larger  values  of  x  are  used,  the 

value  of  y  continues  to  increase — at  a  decreasing  rate,  however.  Such 

would  not  have  been  true  had  a  form  1  parabola  been  used.  This  problem 

★ 

is  discussed  in  the  section  of  this  Memorandum  on  analytic  geometry. 

THE  EXPONENTIAL 

In  its  simplest  fo.m,  the  equation  of  the  exponential  is 

x 

y  *  e  . 

or 

y  ■  10x, 

depending  on  whether  base  e  or  10  is  preferred.  (Recall  the  earlier 
discussion  of  the  advantages  of  each.) 

- j - 


See  op.  19-22. 


greater  or  less  than  x,  depending  on  the  value  of  b.  For  illustrative 
purposes,  we  will  work  exclusively  with 


although  those  who  prefer  to  use  base  e  iray  do  so,  s irue  the  procedures 
are  the  same. 

Unfortunately  there  is  no  direct  least-squares  solution  for  fit¬ 
ting  a  curve  of  this  type.  There  are  iterative  methods  that  can  be 
used  to  approximate  a  least-squares  solution,  but  they  require  a  l^rge 

k 

computer  to  be  of  practical  use. 

The  usual  method  is  to  transform  the  exponential  into  a  linear 
equation  by  taking  the  logarithms  of  each  side  as  follows: 


log  =  ;  +  ;.r. 


We  then  substitute  the  logarithms  of  the  u  values  for  the  actual 
values  and  employ  the  least -squares  normal  equations  for  fitting  a 
straight  line.  It  should  be  noted,  however,  that  this  method  does  not 
yield  the  same  least-squares  solution  for  and  h  as  the  exponent, 
form  does.  The  criterion  of  least  squares  is  applied  to  the  logarithms, 
not  to  the  actual  values  of  ,  which  results  in  minimization  of  the 
relative  rather  than  the  absolute  deviations.  The  fitted  line  is 
also  higher  than  would  be  the  case  had  the  least-squares  criterion  been 


* 

C.  A.  Graver  and  H.  E.  Borer,  Sfu ;  t  ii'ariaU'  I  aja^i  <i*ui  t'xro- 

ncKti  2 1  F.i'gtv?!?  1 0*1  Mc-U'ls,  The  RAND  Corporation,  RM-4879-PR,  July  1967; 
in  this  Memorandum,  the  term  "exponential''  applies  to  the  power  form 
used  in  this  text . 

A  A 

Ioid.  This  approach  is  fine  when  one  wants  to  minimize  relative 
rather  than  absolute  differences.  One  could  argue  that  such  is  the 
case  for  riost  cost-analysis  problems. 


« 
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applied  to  the  actual  values.  A  similar  phenomenon  occurs  when  the 
method  of  averages  is  used. 

To  illustrate  the  least-squares  method  we.  apply  it  to  the  data  in 
Table  8,  Notice  that  the  first  step  is  to  obtain  the  logarithms  of  the 
y  values.  From  that  point  on,  the  calculations  required  are  as  indi¬ 
cated  in  the  column  headings.  The  normal  equations  are  the  same  as 
for  the  linear  case  with  log  u  substituted  for  u : 

l  log  y  =  Na  +  b  i  x 
[  x  log  y  =  a  l  +  b  }  x  . 


Table  Z 


USING  THE  METHOD  OF  LEAST  SQUARES  TO  FIT 
THE  EXPONENTIAL:  WORKSHEET 
(semi-log  form) 


y 

i°g  y 

- 1 

X 

2 

X 

x  log  y 

log  y  (calc) 

,  ,  ,  |  Percent 

w  (  C  3  1 C  )  j 

‘  j  Deviation 

30 

1.4771 

1 

1 

1.477 

1.387 

24.4  |  18.7 

19 

1.2788 

2 

4 

2.558  : 

1.280 

19.0 

i 

0.0 

15 

1.1761 

3 

9 

3.528 

1.17  2 

14.9 

0.7 

10 

1.0000 

4 

16 

4.000 

1.065 

11.6 

-16.1 

9 

0.9542 

5 

25 

4.771 

0.958 

| 

9.  1 

-1.1 

0 

0.7782 

|  6 

36 

4 . 669 

0.851 

7.1 

-18.3 

5 

0.6990 

1 

/ 

49 

4.893 

0.744 

5.5 

-10. 0 

4 

0.6021 

8 

64 

4.816 

0.636 

4.3 

-7.5 

4 

0.6021 

9 

81 

5.419 

0.529 

3.4 

15.0 

3 

0.4771 

i  10 

U  —  -  -H 

100 

4.771 

0.422 

2.6 

13.3 

105 

9.0447 

i55... 

385 

40.902 

— 

10.1  av 

Substituting  the  appropriate  values  from  Table  8  yields 

9.0446  »  10-_i  ♦  537, 


40.902  -  IS.;  +  38V  , 
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whlch  when  solved  result  In 


a  ■  1.494, 

!  -  -0.1072. 

The  desired  equation  Is  therefore  either 


log  y  -  1.494  -  0.1072*, 


or 


^  -  10 


1.494-0.1072* 


The  graph  of  this  solution  is  shown  in  Fig.  34.  Because  only  the  left- 
hand  member  of  the  equation  is  expressed  in  logarithms,  this  solution 
is  often  called  the  semi-log  form. 

When  the  log  transformation  of  y  is  not  entirely  sufficient  to 
straighten  out  the  data,  adding  or  subtracting  a  constant  from  the 
value  of  y  may  help.  The  equation  that  results  when  the  constant  is 
used  is 


,  _a+i>x 
y  -  a  -  10  , 


or 


lria+£>x  . 

y  -  10  +  a. 


and  in  semi- log  form: 


log  (y  -  a)  ■  a  +  bx. 

The  value  of  the  constant  can  be  found  by  trial  and  error,  but  is  more 
conveniently  estimated  using  the  following  procedure.  The  data  are 
plotted  as  in  Fig.  35  (a)  and  a  freehand  curve  is  drawn.  Three  points 


Fig.  35 — Determining  the  constant  a  using  the  method  of  least  squares 
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are  selected  such  that  two  lie  at  the  extremities  of  the  curve  and  the 
third  lies  halfway  between.  If  the  first  two  points  have  coordinates 
,  q,  and  x0,  y  ^ ,  then  x 3  will  be  equal  to  (x,  +  x7)  :  2.  The  ^ 
coordinates  for  each  of  these  points  are  read  from  the  curve  and  sub¬ 
stituted  in  the  equation 


and  a  is  estimated.  See  Appendix  B  for  the  derivation  of  this  formula. 
To  illustrate,  the  three  points  read  from  the  curve  in  Fig.  35a  are 

F j  =  Oj,  Vj)  =  (1,  2°) , 

P2  =  (x 2 »  y2)  =  (10,  3) , 

P3  =  (x 3 ,  ,v3)  =  (5.5,  7,2), 

and 

=  (29) (3)  -  (7 .2)^ 

a  "  29  +  3  -  (2) (7 . 2) ’ 

a  =  2.0. 


The  value  of  a  is  subtracted  from  each  value  of  u  in  the  data 
and  the  logs  of  (y  -  a)  are  determined.  The  two  steps  are  shown  in 
Table  9.  From  that  point  on,  the  steps  are  the  same  as  used  in  the 
semi-log  or  exponential  case.  When  the  appropriate  values  are  cal¬ 
culated  and  the  normal  equations  solved,  the  results  are 

a  =  1.559, 
b  =  -0.1522, 

a  =  2.00, 
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Table  9 

USING  THE  METHOD  OF  LEAST  SQUARES  TO  FIT  THE  EXPONENTIAL 
WITH  THE  CONSTANT  a;  WORKSHEET 


y 

Q/-a) 

log (u-a) 

D 

| 

x  log(y-a) 

log (y— n)c 

(y-«)c 

u 

Percent 

Deviation 

30 

28 

1.4472 

i 

■ 

1.447 

1.407 

■ 

8.3 

19 

17 

1.2304 

H 

2.461 

1.255 

5.3 

15 

13 

1.1139 

1 

3.342 

1.103 

BjSji 

2.0 

10 

8 

0.9031 

4 

|  16 

3.612 

0  °50 

8.9 

10.9 

-9.0 

9 

7  1 

0.3451 

5 

25 

4.225 

0.798 

6.3 

3.3 

7.3 

6 

1 

4 

0.6021 

6 

36 

3.612 

0.646 

4.4 

6.4 

-6.7 

5 

3 

j  0.4771 

7 

49 

3.340 

0.494 

3.1 

5.1 

-2.0 

4 

2 

0.3010 

8 

64 

2.408 

0.342 

2.2 

4.2 

-5.0 

4 

2 

0.3010 

9 

81 

2.709 

0.189 

1,5 

3.5 

12.5 

3 

1 

0.0000 

10 

00 

0.000 

0.037 

1.1 

3.1 

-3.3 

105 

85 

7.2209 

55 

385 

27.156 

— 

— 

— 

6.19  av 

and  the  estimating  equation  is 


or 


log  (y  -  2.00)  =  1.559  -  0.1522x 


y 


1Ql. 559-0. 1542x 


2.00. 


The  results  are  shown  plotted  on  arithmetic  grids  in  Fig.  35a, 
and  ( y  -  a)  and  y  are  plotted  on  semi-logarithmic  grids  in  Fig.  35b. 
The  extent  to  which  the  addition  of  the  constant  n  improved  the  situa¬ 
tion  can  be  seen  by  comparing  the  average  deviation  of  6.19  calculated 
using  the  constant  a  with  an  average  deviation  of  10.2  calculated  in 
the  straight  semi-log  example.  The  same  data  were  used  in  both  cases. 


-83- 


THE  POWE R  FUNCTION 

The  general  equation  of  the  power  function  is 

b 

»  =  OX  . 

As  was  true  of  the  exponential,  there  is  no  direct  least-squares 
solution  for  fitting  the  power  function.  Iterative  methods  can  be  used 
to  achieve  quite  close  approximations,  but  require  such  extensive  cal- 

: k 

culation  that  they  are  only  practical  when  a  computer  is  available. 

The  usual  practice  is  to  transform  the  power  function  by  taking  the 
logs  of  both  sides  as 


log  u  -  log  a  +  b  log  x. 

The  result  is  a  linear  equation  in  terms  of  the  logarithms  of  both  x 
and  y.  When  this  transformation  is  reflected  in  the  data  by  substi¬ 
tuting  the  log  of  y  for  y  and  the  log  of  X  for  ,r,  the  appropriate 
values  for  the  example  shown  in  Table  10  and  rg.  36a  may  be  calculated 
and  used  in  the  normal  equations  for  a  straight  line  as  follows: 

)  log  y  -  A'  log  a  +  h  )  log  x, 

r  r  -  2 

l  log  x  log  y  -  a  i  log  x  +  b  i  log  x, 

9.6245  =  10a  +  10.8786K 
8.5378  =  10.8786a  +  14.3460/., 
log  a  -  1 • 7994, 

b  =  -0.7694. 

k 

Graver  and  Boren,  p.  75. 
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We  must  recognize  that,  as  before,  the  result  is  a  least-squares  fit 
in  terms  of  the  logs  rather  than  the  actual  values  of  y  .  The  line  will 
be  placed  i,uch  that  the  relative,  not  the  absolute,  deviations  have 
been  minimized. 


Table  10 

USING  THE  METHOD  OF  LEAST  SQUARES  TO  FIT 
THE  POWER  FUNCTION:  WORKSHEET 


y 

b 

X 

~  "1 
log  X 

,  2 
log  X 

log  x  log  .. 

l°g  b 

Percent 

Deviation 

2 

0.5114 

1.5678 

36.98 

26.0 

25 

1.3979 

3 

0.4771 

0.6669 

1.4324 

27.07 

-8.3 

20 

5 

0.4886 

0.9094 

1.2617 

18.27 

00 

13 

1.1139 

6 

0.  7782 

0.6056 

15.88 

-22.2 

10 

1.0000 

10.72 

-7.2 

6 

15 

1 . 1761 

1.3832 

0.9152 

0.8946 

7.84 

-30.7 

6 

20 

1.3010 

1.6926 

1.0124 

0.7984 

6.29 

-4.8 

4 

1  0.6021 

40 

1.6021 

2.5667 

0.9646 

0.5668 

3.69 

-7.8 

3 

1.6990 

2.8366 

0.810b 

0.4923 

3.11 

-3.7 

3 

0.4771 

70 

■131 

3.4044 

0.3798 

2.40 

20.0 

140 

9.6245 

221 

issls 

1.4.  3459 

8.5377 

— 

— 

13.94  a 
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In  this  case,  it  is  easier  to  plot  the  data  on  logarithmic  co¬ 
ordinate  paper,  and  to  draw  the  smooth  curve  as  before.  We  select 
three  points  falling  on  the  curve,  two  at  the  extremities  and  one  in 
between  such  that  its  x  coordinate  is  the  geometric  mean  of  the 
coordinates  of  the  other  two  points,  as 


The  entire  procedure  is  illustrated  in  Table  11  by  Fig.  36(b),  37(a) 
and  37(b).  The  extent  to  which  the  addition  of  the  constant  a  improved 
the  result  can  be  seen  by  comparing  the  average  deviations. 

* 

See  Appendix  B. 


A 

The  calculation  of  .1  is  as  follows: 


P1 3 

(Xj  , 

y{) 

=  (2, 

46) 

* 

P2  = 

(.£  ry  | 

i. 

y2) 

=  (50, 

3. 

2), 

x3  = 

2 

.Too  = 

10, 

P3  = 

(*3, 

=  do, 

,  9. 

5), 

2 

^  !■  '  ?  "  '3 


(3.2) (4b)  -  (9 . 5) (9 
1  3.2  +  4(3  -  2(9.5) 

a  =  1.89. 

The  normal  equations  and  their  solution  are  Riven  below 
V  log  (:v  -  ()  =  :.'x  +  h  )  log  x, 

y  dog  j-)[  dog  (u  -■,)!»  <;  y  log  x  + y  dog  x)~, 

7.9013  =  10a  +  10.8785' , 

5.9598  =  10.8785a  +  14.3457/', 

a  »  1.9318, 

>•  *  -1  .0494. 


The  equation  is 

log  -  J  - 
A 

See  example,  p.  78. 


1.9318  -  1 .0494  log  x. 
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or  as  a  power  function, 


y 


83.35x 


-1.0494 


+ 


1.89. 


THE  METHOD  OF  LEAST  SQUARES  TO  FIT  THE  POWER  FUNCTION  WITH  THE  CONSTANT  a:  WORKSHEET 


100 

80 

60 


-i>0- 


Kifc-  37--Power  function  with  constant  i.  fitted 
using  tht  method  of  least  squares 
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IIK  T  H  R 1 .  F  -  V  A H  I  A. HI  J  (X  R  V  F  FITTING 

1'HH  LINEAR  CASH 

An  empirical  equation  used  tc  describe  a  three-variable  linear 
relationship  has  the  general  form 

which  is  a  simnle  extension  of  the  two-variable  linear  (straight  line) 

* 

equation  previously  disco  -ni .  To  be  consistent  with  the  three-variable 
equation,  we  will  write  the  two-variable  relationship  as: 

'  i  =  a  +  b 

We  have  already  learned  that  the  constant  term,  a,  was  the  value  of 
t  j  when  ,V ,  was  equal  to  0.  U’e  further  learned  that  b,  was  called 
the  slope  of  the  straight  line  and  that,  depending  o  wh-tiu-r  £>,  was 
positive  or  negative,  the  value  of  r,  determined  the  -stent  to  which 
V,  would  be  increased  or  decreased  wit1:  changes  in  > 

The  three-variable  relationship  may  be  thought  of  as  two  two- 
variable  relationships  interacting  with  each  other.  F  -r  example, 

I  •  4  W 

i  ‘  r;  : 

and 

a  -2  +  » 

1  “  L  i  t 

are  two  separate  two-variable  linear  r  e  ia  t  i  onsh  i  ps  ,  th.  first  de¬ 
scribing  the  impact  of  '•  ,  on  the  value  of  '*  and  the  second,  the  ion.t't 
of  1  ,  on  <  .  Itt  the  first  rel.it  j.v  h  i  p ,  however ,  the  extent  to  which  >  , 

’  i  '  > 

* 

When  writing  multi-variable  equations,  it  is  convent iona 1  to 
vise  the  subscripted  x.  as  above  rathe;  ban  t,  and  r  as  has  been  done 
previously. 
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influences  t lie  value  of  A ^  is  not  accounted  for,  nor,  in  the  second, 
is  the  extent  to  which  X ^  influences  the  value  of  A^ .  What  we  really 
need  is  a  relationship  between  Aj  and  X.y  and  between  and  A^  where 

in  each  case  t lie  effect  of  the  other  independent  variable  on  Aj  has 
iieen  eliminated.  Assuming  that  it  is  possible  to  obtain  these,  we 
write 

'  l  =  a1.23  +  h\2.T  2 

and 

■  1  =  a\  .23  +  ^13 . 2" 3 * 

where  subs  ipts  indicate  the  variable  whose  effect  has  been  riiminated. 
In  the  equation  above,  a  is  identified  by  the  subscript  1.23,  indica¬ 
ting  that  a  is  the  value  of  A^ ,  once  the  effects  of  X 2  and  X ^  have  been 
«liminated.  5 ' nee  a  is  a  constant,  the  relation  hip  is  a  simple  one; 
when  ^2  and  ate  eliminated  from  consideration,  A^  is  in  fact  equal 
to  aj  In  this  equation  the  slope  b  is  subscripted  12.3,  indicating 

that  b,1  .is  the  net  slope  of  the  relationship  between  A',  and  A„  in- 
dependent  of  the  impact  of  A^.  The  numbers  to  the  left  of  the  decimal 
point  in  the  subscript  identify  which  two  variables  are  being  related; 
those  to  the  right  identify  ihe  variables  whose  effects  have  been 
eliminated.  The  subscripts  used  follow  a  logical  pattern,  and  in  fact 
this  scheme  of  subscripting  is  often  extended  to  four  or  more  variable 
relationships . 

Now,  giver'  that  in  each  of  the  two  straight  line  relationships 
shown  above  we  have  pure  relationships  (net  relationships)  between 
Aj  and  each  of  the  independent  variables,  and  given  that  the  two 
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independent  variables  completely  determine  the  value  of  ,Vj ,  it  is 
proper  to  combine  them  to  write 


'  V23 + 


hl.fl  + 


this  is  the  three-variable  linear  relationship  with  which  we  began. 

The  coefficients  of  X  anH  X  b.  „  and  „,  are  frequently  referred 
to  as  net  regression  coef f ‘ cients  and  are  in  fact  the  slopes  of  the 
tvro  separate  straight  lines  described  above.  Each  describes  the  im¬ 
pact  of  its  accompanying  variable  on  the  dependent  variable  X .  .  The 
constant  a,  is  simplv  interpreted  as  the  value  of  A',  when  both  A„ 
and  A’ ^  are  equal  to  0. 

To  explore  the  idea  of  a  net  regression  coefficient  further  and, 
at  the  same  time,  to  illustrate  one  way  that  this  type  of  relationship 
can  be  fitted  to  actual  data,  we  will  use  the  following  example.  In 
this  case,  we  will  begin  with  the  answer  and  use  a  curve-fitting  tech¬ 
nique  to  see  how  closely  we  can  reproduce  it. 

Assume  that  we  are  going  to  publish  a  technical  report  and  we 
are  concerned  about  the  cost  consequences  of  including  various  com¬ 
binations  of  illustrations  and  plain  printed  pages.  We  contact  a 
number  of  prospective  printers  and  find  that,  on  the  average,  for  each 
report  printed,  there  are  three  charges:  a  fixed  charge  of  $1.00; 
a  charge  of  $0.10  per  illustration;  and  $0.04  per  printed  page.  The 
charges  may  be  more  concisely  stated  in  the  following  three-variable 
linear  relationship: 


n 

o 


=  $1.00  +  $0,107  +  $0.04P, 
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where  '  =  the  cost  per  report, 

=  the  number  of  illustrations  per  report, 

=  the  number  of  printed  pages  per  rep'-t. 

At  this  point,  we  arbitrarily  select  a  number  of  possibilities, 
choosing  some  with  differing  numbers  cf  printed  pages  and  a  fixed 
number  of  illustrations  and  others  with  varying  numbers  of  illustra¬ 
tions  and  the  same  number  of  printed  pages.  Further,  for  each  com¬ 
bination  chosen,  we  use  the  above  cost  equation  to  determine  what  it 
would  cost  to  print  the  particular  report.  We  select  twelve  reports 
as  shown  In  Table  12,  each  with  a  different  combination  of  illustra¬ 
tions  and  printed  pages,  and  determine  the  printing  cost  of  each. 


Table  12 

DATA  ON  SELECTED  REPORTS 


Report  No. 

No.  of 

1 1 lustrations 
(/) 

No.  of 

Printed  Pages 
(P) 

Cost  to  Print 
per  Copv  ($) 

(C)  ' 

1 

1 

18 

1.82 

0 

2 

4 

1.36 

3 

2  ! 

10 

’  .60 

4 

2 

20 

2 . 00 

5 

3 

15 

1.90 

6 

4 

13 

1.92 

7 

5 

7 

1.78 

8 

5 

16 

2.  14 

9 

6 

6 

1.84 

6 

2 

1.68 

1 1 

7 

1 

1.74 

1  2 

7 

7 

1.98 
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Taking  Report  No.  3  as  an  example,  we  can  see  that  the  cost  of  $1.60 
is  arrived  at  as  follows: 


Fixed  charge . $1.00 

Illustrations  (2)  @  $0.10  .  .  .20 

Printed  pages  (10)  @  $0.04  .  .  ,40 

Total . $1.60 


Let  us  now  assume  that,  instead  of  having  the  equation  which 
allowed  us  to  calculate  the  costs  above,  we  have  only  the  data  contained 
in  Table  12  and  we  wish  to  find  the  equation.  In  such  an  example 
(which  is  unlike  the  usual  case)  we  will  assume  that  we  know  the  price 
to  be  influenced  only  by  the  two  variables,  number  of  illustrations 
(I)  and  number  of  printed  pages  (P) . 

As  has  been  our  practice  in  the  past  ir  attacking  such  problems, 

we  begin  by  constructing  scatter  diagrams,  but,  because  it  is  difficult 

* 

(although  possible)  to  construct  three-dimensional  scatter  diagrams, 
we  will  be  content  with  the  more  usual  two-dimensional  diagrams.  In 
do^ng  this,  let  us  think  in  terms  of  the  two  two-variable  straight 
lines  discussed  earlier.  We  begin  by  plotting  the  cost  (C)  against 
the  number  of  illustrations  (!)  on  one  graph  and  the  cost  (C)  against 
the  number  of  printed  pages  (P)  on  the  other.  The  first  two  diagrams 
(a  and  b)  in  Fig.  38  show  the  results.  As  we  should  have  expected, 
in  neither  case  do  we  see  a  clearly  defined  relationship.  Any  rela¬ 
tionship  that  might  exist  between  cost  and  the  number  of  illustra¬ 
tions  is  obviously  distorted  by  the  fact  that  reports  with  the  same 
number  of  illustrations  have  different  numbers  of  printed  pages. 

W.  A.  Spurr  and  C.  P.  Bonini,  Statistical  Analysis  for  Business 
Decisions,  Ricnard  D.  Irwin,  Inc.,  Homewood,  Illinois,  1967,  p.  592. 


I 
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For  example,  there  are  three  reports  each  with  two  illustrations, 
but  one  has  four,  one  has  ten,  and  one  twenty  printed  pages.  The 
number  of  illustrations  similarly  distorts  the  relationship  between 
cost  and  number  of  printed  pages  shown  in  Fig.  38b.  F.ven  with  all  of 
the  distortion  present,  it  is  possible  to  see  a  general  upward  trend 
in  Fig.  38b.  As  the  number  of  printed  pages  increases  there  is  a  com¬ 
mensurate  increase  in  the  cost.  Our  curve-fitting  technique  will  be 
to  capitalize  on  this  by  fitting  a  straight  line  to  the  data  plotted 
in  Fig.  38b  and  to  use  the  results  to  improve  the  relationship  between 
cost  and  the  number  of  illustrations.  For  simplicity  we  will  use  the 
method  of  averages  to  fit  the  straight  line  and  the  point-slope  formula 
to  ”rite  the  required  equation.  The  details  of  these  and  other  re¬ 
quired  computations  are  shown  in  Table  13.  When  using  the  method  of 
averages,  the  data  are  first  ordered  according  to  the  value  of  the  in¬ 
dependent  variable  (see  Columns  a,  b,  c,  d)  of  Table  13.  Because  there 
are  two  independent  variables  involved  and  because  the  data  cannot  be 
ordered  according  to  both  of  them  at  the  same  time,  two  separate  set¬ 
ups  are  required.  T  lose  calculations  that  require  ordering  according 
to  number  of  illustrations  are  shown  on  the  upper  half  of  Table  13, 
and  those  that  require  ordering  according  to  number  of  printed  pages 
are  shown  on  the  lower  half  of  Table  13.  Since  the  sequence  requires 
stepping  back  and  forth  between  the  upper  and  the  lower  half,  the  steps 
are  indicated  by  the  numbers  shown  in  circles  at  the  head  of  each  column. 

The  calculations  of  the  average  points  for  fitting  the  first 
straight  line  (between  cost  and  number  of  printed  pages)  are  shown  in 


the  lower  half  of  the  table  in  Column  1.  The  coordinates  of  the  two 


Tabl 


USING  SUCCESSIVE  APPROXIMATIONS  AND  THE  METHOD  OF  AV 


Expression:  C  = 


(d) 

Cost  per 
Report  (C) 

© 

<<•’> 

© 

«’2> 

© 

<rS 

© 

((•*) 

O 

(r5) 

■ 

(ri(.  38») 

(Fig.  38c) 

'Fig.  38*) 

(Fig.  38g) 

1.82 

1.54 

1.77 

1.31 

1.74 

1.20 

1.36 

1.30 

1.25 

1.25 

1.21 

1.22 

1.60 

1.44 

1.49 

1.31 

1.45 

1.26 

2.00 

1.69 

1.89 

1.43 

1.85 

1.32 

1.90 

1.66 

1.74 

1.47 

1.67 

1.39 

1.92 

1.72 

1.-70 

1.55 

1.61 

1.48 

Av.  ■  1.56 

Av.  -  1.39 

Av.  *  1.31 

1.78 

1.67 

1.51 

1.58 

1.40 

1.54 

2.U 

1.89 

1.87 

1.68 

1.76 

1.59 

1.68 

1.65 

1.35 

1.62 

1.22 

1.61 

1.84 

1.75 

1.51 

1.67 

1.38 

1.63 

1.74 

1.72 

1.36 

1.71 

1.21 

1.71 

1.98 

l  .87 

1.60 

1.78 

1.45 

1.74 

Av.  -  1.76 

Av.  ■  1.67 

Av .  -  1.64 

A  -  0.20 

A  -  0.28 

0-0.33  , 

r'-l .433^0.05457 

'2-,  ’-0.054  3 7 

’’-1.212+0.07637 

C*->0. 07637 

C5- 1.1 00+0. 08997 

C 

! 

® 

© 

1 

® 

1 

0 

■ 

(.•) 

(•'» 

(,-’> 

■HI 

(Fig.  38b) 

(Fig.  3*d) 

(Fig.  387) 

(1 

1.74 

1.72 

1.36 

1.71 

1.21 

1.71 

1.68 

1  .65 

1.35 

1.62 

1.22 

1.61 

1.36 

1.30 

1.25 

1.25 

1.21 

1  2? 

1.84 

1.73 

1.51 

1.67 

1.38 

1.6) 

1.78 

1.67 

1.51 

1.58 

1.40 

1.54 

1.98 

1 .87 

1.60 

1 .  78 

1  .45 

1.74 

Av.  -  1.73 

Av.  -  1.43 

Av.  -  1.31 

* 

1.60 

1.44 

1.49 

1 .31 

1.45 

1.26 

1.92 

1.72 

1.70 

1.55 

1.61 

1.48 

1.90 

1.66 

1.74 

1.47 

1.67 

1.39 

2  .14 

1.89 

1 .87 

1.68 

1.76 

1.59 

1.92 

1.54 

1.77 

1.31 

1  .  74 

1.20 

2.00 

1.69 

1.89 

1.43 

1.85 

1.  32 

-*1.90 

Av.  -  1.74 

Av .  -  1.68 

* 

A  -  0.17 

A  -  O.li 

A  -  0.37 

i\-1.65*t0.0157F 

'•  ’-0.0137;' 

’2- 1.301*0.0286/’ 

'3-,  ’-0.0286!’ 

*-1.136+0.03*27' 

’  ->0.03*2/’ 

"  9 
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points  (P  j ,  C 
spec  t ively . 

To  simplify  c 
slope  formula 


or 

where 


In  the  first 

■  nd 

Because  •'  (  - 
are  entered 
di rated  by  ' 


and  (P^,  C7)  are  (4.50,  1.73)  and  ( 
The  modified  point-slope  formula  is 


alculation  with  a  desk  calculator,  the 
above  was  recast  as  follows: 


=  a  +  t  , 


case,  the  values  are  substituted  and 


( i_. :  m i s.  i n  -jh goj_(s._3o> 
15.11  -  4 . 40 


1  -  1 .  -_i 

fs.  u  -  T!*50 


1 ' .  i 1 1 4  7 . 


•  and  1  are  most  easilv  oa inula 
n  the  table,  thev  should  v'0  done  at  t> 
entered  in  the  appropriate  column. 


15.33,  1.90;  re¬ 


modified  point- 


?)  \  '1 


ited  as  the  nunhers 
wit  time  and  in- 
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The  equati.1  oi  the  straight  line  &< cribing  the  relationship 
between  cost  and  number  of  printed  pages  is  thus 

=*  1.659  +  0.0157  . 

The  subscript  is  used  to  indicate  that  values  of  '  calculated  from 
this  equation  are  estimates  rather  than  actuals.  When  this  equation 
is  plotted  as  in  Fig.  38b,  it  gives  a  rough  approximation  of  the  true 
relationship.  However,  a  rough  approximation  is  better  than  none,  as 
we  shall  subsequently  see. 

At  the  moment,  the  value  of  t lie  constant  a  is  of  no  interest. 

The  value  b  or  0.0137  means  that  for  each  printed  page  we  must  add 
1.57  cents  to  the  cost .  We  can  reduce  the  cost  of  each  case  by  this 
figure  in  proportion  to  the  number  of  printed  pages,  and  then  examine 
these  results  with  respect  o  the  number  of  illustrations.  The  ad¬ 
justment  is  made  by  setting 

1  -  '  -  0.01 57'-. 

For  report  No.  11  the  result  would  be 

1  *  1.74  -  (0.0157)0), 

1  -  1.72, 

;»  ■  shown  in  lolumn  2  in  the  lover  half  of  Table  1  \ .  We  next  make  the 
same  ledu.  turn  in  cost  for  each  report  in  proportion  to  the  number  of 
printed  pages.  When  this  has  been  completed,  we  transfer  the  results 
to  Column  ),  in  the  upper  portion  of  the  table,  and  s imul taneou* iv 
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reorder  them  according  to  the  number  of  Illustrations  in  each  case. 

We  indicate  these  values  by  the  symbol  where  1  (known  as  a  super¬ 
script,  not  an  exponent)  signifies  the  first  adjustment  to  the  original 
costs. 

When  we  have  plotted  these  adjusted  costs  against  the  number  of 
illustrations  as  in  Fig.  38c,  we  hav_  a  more  definite  relationship  than 
that  indicated  in  Fig.  38a.  What  has  happened  is  this:  Although  the 
equation  relating  cost  to  number  of  printed  pages  was  extremely  rough, 
it  was  sufficient  to  eliminate  enough  of  the  effect  of  printed  pages 
from  ’  to  clear  up  the  relationship  between  '  and  /. 

The  next  step  is  to  follow  our  logic  and  determine  the  relation¬ 
ship  between  and  /  using  the  results  to  further  clean  up  the  re¬ 
lationship  between  cost  and  number  of  printed  pages.  Once  again,  we 
emplov  the  method  of  averages,  placing  the  results  in  Column  3  in  the 
upper  portion  of  Table  13.  This  fitted  line  can  be  seen  plotted  in 
Fig.  38c.  The  equation  of  the  fitted  line  is 

,l  -  1.433  F  0.0545/. 

J 

This  equation  gives  us  an  approximation  of  the  impact  of  the  number 
of  illustrations  on  cost--in  this  case  5.45  cents  per  illustration. 

The  costs  are  again  adjusted  as  in  Column  4  in  the  upper  portion  of 
Table  13,  this  time  to  eliminate  the  effect  of  the  number  of  illus¬ 
trations  according  to  the  approximation  given  above.  This  adjustment 
is  made  according  to  the  formula 

■2  -  -  0.0545/, 
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2 

where  '  indicates  tliat  the  cost  has  been  adjusted  for  the  second  time. 

The  adjusted  figures  are  next  transferred  to  Column  5,  lower  half 
of  Table  13,  and  the  results  plotted  against  the  number  of  printed 
pages  as  in  Fig.  38d.  A  comparison  of  Fig.  38d  with  Fig.  38b  shows 
the  extent  to  which  our  first  approximation  of  the  cost  of  illustra¬ 
tions  has  improved  the  relationship  between  total  cost  and  number  of 
printed  pages.  This  process  of  refining  the  approximations  is  con¬ 
tinued  first  with  respect  to  one  of  the  independent  variables  and  then 
the  other.  Eacli  time  an  approximate  relationship  is  obtained  it  is 
used  to  further  adjust  the  cost;  the  adjusted  cost  is  then  related  to 
the  other  independent  variable  and  the  process  repeated  again.  The 
calculations  in  Table  follow  the  adjustment  process  through  thirteen 
times.  The  calculations  of  Columns  3  through  8  and  Columns  12  and  13 
in  Table  13  art  illustrated  by  Fig.  38d  through  Fig.  38m. 

The  relationship  between  cost  and  number  of  printed  pages  shown 
in  Fig.  38k  which  was  arrived  at  on  the  12th  adjustment  can  be  de¬ 
scribed  by  the  linear  equation 

T12  =  1.0113  +  0.03977’. 

a 

This  equation  is  quite  close  to  that  portion  of  the  original  equation 
dealing  with  printed  pages, 

.’  =  1.00  +  0.04P. 

The  relationship  between  cost  and  number  of  illustrations  shown 
in  Fig.  38m  is  also  quite  close  to  the  relevant  part  of  the  original 
equation: 
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.13 


1 .0144  +  0.0981 /, 


as  compared  to 


'  =  i  .00  4-  0.  io:. 


WTien  the  two  two-variable  equations  are  combined  as 

\  =  1.01  +  0.098:  +  0.0397:  , 

we  have  a  very  close  representation  of  the  original  equation 

'  =  1.00  +  0.10'+  0.04;'. 

Had  we  continued  with  our  process  of  successive  approximation  and 
adjustment,  we  could  conceivably  have  reproduced  the  original  equation 
exactly.  But  this  would  have  meant  carrying  the  calculations  to  more 
significant  digits  which  was  unnecessary  for  the  purposes  of  this 
example.  This  method  of  curve  fitting,  quite  appropriately  called 
the  Method  of  Successive  Approximations,  can  be  used  quite  generally — 
even  in  cases  where  the  separate  relationships  can  only  be  described 
by  freehand  non-mathematically  describable  curves. 

Fortunately  the  method  of  least  squares  accomplishes  similar 
results  for  the  three-variable  linear  relationship  by  means  of  a  di¬ 
rect  and  absolute  rather  than  an  approximate  solution,  To  show  that 
both  methods  result  in  the  same  solution,  the  method  of  least  squares 
is  next  applied  to  the  same  problem.  Data  are  calculated  in  Table  14, 
and  the  accompanying  graphs  plotted  in  Fig.  39.  Normal  equations  for 
this  solution  are  as  follows: 
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* 

\ 

*  .Va  +  /  +  b2  ^  ?  , 

■  J;'  =  a  ’  7  +  5j  7  I*  +  £  ;  jrt 

I  :F<7  =  a  ;  F  +  6.  ■  /*  +  F2. 

1  -  i  ■- 

i 

i 

l  This  works  out  to 

21.76  =  12a  +  502>.  +  119b2, 

91.88  =  50a  +  258fc  +  402 
224.36  =  119a  4  402^  +  1629 . 
Therefore  the  solution  is 

c  =  i.ooo  +  o.ior  4  o.o4/'. 

a 


Fig.  39 — Least-squares  solution  to  the  three-variable  problem 
showing  one  way  to  graph  a  three-variable  equation 
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THE  NONLINEAR  CASE 

It  is  not  unusual  to  encounter  sets  of  three  or  more  variables 
that  cannot  be  adequately  described  using  linear  relationships,  and 
that  require  nonlinear  curve  fitting.  In  this  section  of  the  Memo¬ 
randum  we  will  use  the  method  of  least  squares  to  fit  the  straight 
line,  the  exponential,  the  power  function,  and  the  parabola  to  a  set 
of  one  dependent  and  two  independent  variables. 

Fitting  a  three-variable  linear  equation  and  using  the  method  of 
least  squares  has  already  been  described.  We  remember  that  the  linear 
equation 


*1  =  a  +  £>2*2  +  ^3*3 

resulted  from  the  two  two-variable  equations 

xl  "  a  +  b2X2' 

and 

=  a  +  b3X3, 

with  each  describing  the  relationship  between  the  dependent  variable 
and  either  X ^  or  In  each  case,  the  influence  of  the  other  wan 

not  accounted  for.  In  the  combined  relationship,  b ^  and  b3  were 
written  3  and  ^33  2  t0  s^ow  that  in  the  first  case  the  effect  of 
X3  was  eliminated,  and  that  in  the  second  case  the  effect  of  X^  was 
eliminated.  The  method  of  successive  approximations  was  used  to 
demonstrate  how  this  could  b<_  done.  Further,  it  was  shown  that  the 
metnod  of  least  squares  produces  the  same  answer  with  considerably 
less  effort. 
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We  will  now  build  on  cnese  fundamentals  to  illustrate  three- 
variable  nonlinear  curve  fitting. 

As  there  is  nothing  about  the  detailed  calculations  required  here 
that  is  different  from  those  previously  illustrated,  we  will  r.jt  de¬ 
scribe  them  again.  Instead,  we  will  concentrate  on  shoring  how  vari¬ 
ous  nonlinear  functional  forms  can  be  used.  In  particular,  we  will 
point  up  their  peculiarities  and  consequently  the-'r  limitations. 

Twenty  sets  of  the  three  variables — Xy  and  X ^ — are  shown  in 
Table  15.  is  the  dependent  variable;  X ^  and  X^  are  the  independent 
variables.  We  will  proceed  to  fit  a  linear,  an  exponential,  a  power 
function,  and  a  parabolic  relationship  to  these  variables.  Good  prac¬ 
tice  dictates  that  we  start  by  examining  the  data  more  closely.  As 
with  the  two-variable  case,  preparing  a  scatter  diagram  is  always  a 
good  beginning. 

Figure  40  shows  the  results  of  plotting  X j  against  X,}  while 
ignoring  X Little  more  than  a  general  scattering  of  points  is  ob¬ 
served.  But  when  each  point  is  identified  with  its  X ^  value  and  con¬ 
tour  lines  connecting  all  points  with  equal  values  of  X.^  are  drawn, 
as  in  Fig.  41,  a  relationship  can  be  seen.  For  each  value  of  A'„  X ^ 
increases  with  increases  in  Xy 

Figure  42  shows  similar  results.  Here  X^  is  plotted  against  X ^ 
and  contours  connecting  points  having  equal  values  of  X ^  have  been 
drawn.  For  fixed  values  of  X^ ,  increases  with  increases  in  Xy 
At  this  point,  we  also  not,’  a  distinct  curvature  in  one  or  two  of  the 
contours  which  suggests  a  nonlinear  relationship  between  X^  and  Xy 

A  point  from  which  to  compare  the  results  of  fitting  nonlinear 
relationships  has  been  provided  by  fitting  a  linear  relationship  to 
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Tablt  15 


RESULTS  OF  FITTING  A  THREE- VARIABLE  LINEAR  RELATIONSHIP 
(Xj  =  -20.01  +  0.4998^2  +  1  295^) 


Observation 

*1 

*2 

V3 

A^  (calc . ) 

AyXjCcalc.) 

Percent 

Deviation 

1 

7.31 

5 

5 

-11.08 

18.39 

251.5 

2 

37.67 

5 

49 

45.92 

-8.25 

-22.0 

3 

67.37 

5 

71 

74.42 

-7.05 

-10.5 

4 

121.31 

5 

100 

111.99 

9.38 

-7.7 

5 

20.93 

16 

27 

22.92 

-1.99 

-9.5 

6 

24.77 

27 

27 

28.42 

-3.65 

-14.7 

7 

33.57 

27 

38 

42.67 

-9.10 

-27.1 

8 

22.78 

38 

16 

19.66 

3.12 

13.7 

9 

29.  16 

38 

27 

33.91 

-4.75 

-16.3 

10 

118.26 

38 

93 

119.41 

-1.15 

-1.0 

11 

39.62 

60 

27 

44.91 

-5.29 

-13.4 

12 

45.68 

71 

27 

50.41 

-4.73 

-10.4 

13 

149.34 

71 

100 

144.97 

4.37 

2.9 

14 

41.97 

82 

5 

27.40 

14.57 

34.7 

15 

59.48 

93 

27 

61.40 

-1.92 

-3.2 

16 

H8.58 

93 

93 

146.90 

1.68 

1.1 

17 

163.14 

93 

100 

155.96 

7.18 

4.4 

18 

73.  14 

100 

38 

79.15 

-6.01 

-8.2 

19 

114.06 

100 

71 

P  l .  90 

-7.84 

-6.9 

20 

153.44 

100 

93 

150.40 

3.04 

2.0 

_ 1 

23.0  av 

the  data.  The  Least-squares  normal  equations  and  the  resulting  linear 
relationship  follow: 


l  -  Na  +  b2  l  X2  +  b3  ^  X3 * 

L  V?  =  a  '  X2  +  b 2  l  X2  +  hi  *  A2X3’ 

y  x.x~  «  a  i  A-.  +  b.  y  x-.v,  +  b.  y  *!?, 


X1  -  -20.01  4-  0.4998*2  4*  1.295X3. 


100  X 


0  20 

30  40  50  60  70 

80 

90 

1 00 

ng.  4i- 

-Scatter  diagram:  vs 

showing  equal  values  of 

*2 

vM  th 

contours 

A 

0  20  JO  40  SO  60  70  HO  90  100 


g.  42--Siat ter  diagram:  Aj  vs  A,  with  contours 
showing  equal  values  of  A, 


i 

i 

i 

■  In  Fig.  4  1,  '  is  plotted  against  .V  ignoring  >  .  The  straight 

f  1  3 

lines  result  from  solving  the  linear  equation  above,  allowing  '.4,  to 
i 

;  vary  over  its  relevant  range  while  holding  v ^  constant  at  the  values 

indicated  in  Fig.  43.  The  deviations  of  the  points  from  the  appro¬ 
priate  lines  are  indicated  by  the  vertical  connecting  lines.  As  was 
to  be  expected,  the  linear  relationship  does  not  describe  the  data 
very  well.  A  tabular  presentation  of  the  results  was  shown  i.  Table  IS. 

Given  the  indications  of  nonlinearity  in  Figs.  41  and  4J  and  the 
poorness  of  fit  achieved  with  the  linear  form,  a  nonlinear  form  seems 
in  order.  When  confronted  with  a  similar  situation,  analysts  often 
turn  immediately  to  the  power  function  on  the  grounds  that  it  will 
straighten  anything  out.  We  will  try  this  and  see  what  happens. 

The  basic  power  function  in  two  variables  is 

*  > 

•1  =  ’ 

o  r 

1 

'  1  ' 

in  logarithmic  form  these  equations  become 

log  '  J  log  ;  +  ;  ,  log  .  .  , 

and 

log  *  j  =■  log  ,  +  ;  j  log  .  y 

fin*  transition  Iron  the  two-variable  equations  to  the  cne  three- 

v. triable  equation  is  analogous  to  the  equations  presented  in  the  he¬ 
ft 

ginning  of  Section  III. 

* 

See  p  p  .  s  1  -  v  _ 
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Ei ther 

log  .V j  =  .1  f-  log  X 2  +  by  log  Xy 
or 

r\  h 

‘  9  ^3 

•Y1  '  «  2  '  "3 

is  the  required  equation  and,  as  can  be  seer.,  the  equation  is  linear 

in  terms  of  the  logarithms  of  the  variables.  The  least-squares  normal 

equations  used  before  are  appropriate  here,  given  that  the  logarithms 
of  the  variables  are  substituted  for  the  variables.  For  example. 

:  log  X[  =  ,V, a  +  b y  i  log  X 2  +  by  \  log  Xy, 

l  log  X{  log  X2  =  a  v  log  X2  +  b2  l  (log  X?)^  +  by  j  log  Xy  log  Xy, 

l  log  .Vj,  log  Xy  =  J  l  log  Xy  +  by  l  log  Xy  log  Xy  +  ky  \  (lOg  Xy^. 

When  the  required  values  are  calculated  and  this  set  of  equations 
solved,  the  following  power  function  results: 

log  X{  =  0.16555  +  0.26963  log  Xy  +  0.73198  log  Xy, 
or 

X  .  1.464A-  °-26963  *„0-73198. 

1  2  3 

How  well  this  equation  does  the  job  is  shown  in  Fig.  44  and  Table 
16.  It  is  obviously  no  better  than  the  linear  relationship  and  possi¬ 
bly  even  a  little  worse.  The  most  striking  shortcoming  is  that  the 
direction  of  curvature  is  wrong.  Figures  41  and  42  indicate  tnat  the 
required  curve  should  be  concave  upwards,  and  these  curves  are  concave 
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Table  16 

RESULTS  OF  FITTING  A  THREE- ”ARI ABLE  POWER  FUNCTION  RELATIONSHIP 
(log  .Yj  =  0.16555  +  0.26963  log  *2  +  0.73198  log  X.}) 


- - - - - - - 

Observation 

x 

i 

"2 

y 

“  3 

X  'calc.) 

A'1~A'1  (calc. ) 

Percent 

Deviation 

1 

7.31 

5 

5 

7.34 

-0.03 

-0.4 

2 

37.67 

5 

49 

39.01 

-1.34 

-3.6 

3 

67.37 

5 

71 

51 . 18 

16. 19 

24.0 

4 

121.31 

5 

100 

65.76 

55.55 

45.8 

5 

20.93 

16 

27 

34.51 

-13.58 

-64.9 

6 

24.77 

27 

27 

39.74 

-14.97 

-60.4 

7 

33.57 

27 

38 

51.03 

-17.4c 

-52.0 

8 

22.78 

38 

16 

29.71 

-6.93 

30.4 

9 

29.16 

38 

27 

43.58 

-14.42 

-49.5 

10 

118.26 

38 

93 

107.75 

10.51 

8.9 

11 

39.62 

60 

27 

49.29 

-9.67 

-24.4 

12 

45.68 

71 

27 

51.57 

-5.89 

-12.9 

13 

149.34 

71 

100 

134.48 

14.86 

10.0 

14 

41.97 

82 

5 

15.60 

26.37 

62.8 

15 

59.48 

93 

27 

55.47 

4.01 

6.7 

16 

148.58 

93 

93 

137.15 

11.43 

7.7 

17 

163. 14 

93 

100 

144.64 

18.51 

11.3 

18 

73.  14 

100 

38 

72.64 

0.50 

0.7 

19 

114.06 

100 

71 

114.79 

-0.73 

-0.6 

20 

153.44 

100 

93 

139.86 

13.58 

8.9 

24.3  av 

downwards.  Did  we  make  a  mistake  in  arithmetic?  No,  there  was  no 
mistake,  except  in  the  selection  of  the  power  function  in  the  first 
place.  Figure  18  (the  general  shape  of  the  power  function  for  values 
of  x  greater  than  or  equal  to  0)  could  have  told  us  that  we  would  get 
what  we  did.  This  is  another  illustration  of  the  value  of  the  scatter 
diagram  and  a  knowledge  of  the  basic  properties  of  the  functional 
forms  with  which  we  are  dealing.  Consider  a  situation  similar  to  this 
one  except  that  the  fit  is  better.  In  such  a  case  we  might  well  have 
used  this  relationship  for  extrapolating  beyond  the  upper  range  of 


the  sample. 


-  i  1  ?- 


It  is  true,  however,  that  the  exponential  has  the  general  prop¬ 
erty  we  desire;  refer  back  to  Fig.  14.  The  two  variable  exponentials 
would  be 


and 


Y 

J 


or,  in  more  useful  form 

a+b  X 

X.  =  e 


and 


A'l  = 


a+b3X3 


For  further  clarification  on  this  point  refer  to  the  earlier  section 
on  the  properties  of  the  exponential. 

When  the  natural  logarithms  of  each  side  of  each  equation  are 
taken,  we  have 


J* 

-i 

) 


and 


In  XL  =  a  +  b2X2 


In  A  ^  a  +  ^ 3^3  * 


which  combines  into  the  following  three-variable  equation  as  before: 


In  Xl  "  a  +  b2X2  +  ‘V3’ 


which  is  linear  when  the  logarithm  of  is  used  in  place  of  X ^ . 


The  least-squares  normal  equations  are  as  for  the  linear  curve 


with  In  X j  substituted  for  : 

l  In  X{  =  Na  +  b2  £  X2  +  fc  J  JT  , 

I  X 2  In  Xl~alX2  +  b2  [  x\  +  *3  [  *^3, 

I  X3  In  Xl  =  a  l  X3+  b2l  X2X3  +  ^  >  x\. 

The  resulting  equation  is 


In  x  -  2.509  +  0.0092732  +  0.019415  J(,  , 

x  2  j 


or 


2.509  +  0.0092732  JT,  +  0.019415  X, 
e  2  3 


Juct  how  well  this  equation  Tits  the  data  is  shown  in  Table  17 
and  Fig.  45.  We  note  from  observing  the  scatter  diagrams  and  the 
average  percent  deviations  that  the  exponential  relationship  comes 
closer  to  fitting  the  data  than  does  either  the  linear  or  the  power 
function.  The  direction  of  curvature  is  as  we  predicted.  However, 
while  things  are  progressing,  the  exponential  leaves  much  variation 
to  be  explained. 

Another  curve  which,  in  general,  has  the  desired  properties  (at 
least  in  part)  is  the  parabola  of  the  form 

2 

y  -  a  +  bx  +  ax  . 


The  earlier  section  on  the  parabola  pi.vided  a  complete  discus¬ 
sion  of  this  equation.  This  equation  is  in  two  variables,  however, 
and  for  our  purposes  we  need  one  in  three.  Fortunately,  we  may  proceed 


Table  17 


RESULTS  OF  FITTING  A  THREE- "ARIABLE  EXPONENTIAL  RELATIONSHIP 
(In  X  =  2.509  +  0.0092732^  +  0.0194157^) 


Observati on 

A'l 

“l 

V 

"2 

X  i  ( r  a  1  r  .  > 

Tj -Tj (calc . ) 

! - 

i  Percent 
Deviation 

1 

7.31 

5 

5 

14.  19 

-6,88 

-94.1 

2 

37.67 

5 

49 

33.34 

4.33 

11.5 

3 

67.37 

5 

71 

51.10 

16.27 

24.2 

4 

121.31 

5 

100 

89.74 

31.57 

26.0 

5 

20.93 

16 

27 

24.08 

-3.15 

-15.  1 

6 

24.77 

27 

27 

26.67 

-1.90 

-7.7 

7 

33.  57 

27 

38 

33.02 

0.55 

1.6 

8 

22.78 

38 

16 

23.86 

-1.08 

-4.7 

9 

29.  16 

38 

2  7 

29.54 

-0.38 

-1.3 

10 

118.26 

38 

93 

106. 38 

11.88 

10.  1 

11 

39.62 

60 

27 

36.22 

3.40 

8.6 

12 

45.68 

71 

27 

40.11 

5.57 

12.2 

13 

149.34 

71 

100 

165.49 

-16.15 

-10.8 

14 

41.97 

82 

5 

28.98 

12.99 

31.0 

15 

59.48 

93 

27 

49.  19 

10.29 

17.3 

16 

148,58 

93 

93 

177.15 

-28.57 

-19.2 

17 

163.14 

93 

100 

202.94 

-39.80 

-24.4 

18 

'3.14 

100 

38 

64.98 

8.16 

11.2 

19 

114.06 

100 

71 

123.32 

-9.26 

-8.  1 

20 

153.44 

_ i 

100 

93 

189.03 

-35.59 

1 

-23.2 

- i 

18,1  av 

as  before.  The  two  variable  equations  are: 

9 

Aj  -  cl  +  r  2  ^  t^A  7  » 


and 


a  +  N'A  +  l*3-V3 


which  combined,  form 
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Notice  that  instead  of  two  independent  variav'1«*s,  and  ,  we  now 

2  2  2  2 
have  four  variables:  ,  X^,  X^,  and  X Fortunately  X 2  and  X ^  may 

be  calculated  given  A'^  and  ,  so  that  we  have  a  special  case  of  fitting 

whac  is  essentially  a  five-variable  linear  relationship.  The  least- 

squares  normal  equations  follow: 


1  A'j  =  No.  +  b ^  £  '^2  <? 2  ^  ^2  +  ^3  ^  ^3  +  0*3  J 

/  X.X„  =  (2  J  .L  +  £>_  ~  X  +  ;•  )  A".  +  ,>  X  X.  +  c.  J  .Y- , 

1  1  2  c  2  2  u  2  2-2.  i  2  i  3  3 

y  *  a  y  *2  +  }  xl  +  ^  y  4  +  *>,  y  ^  y  .v2-2 


1 2 


2  2  -  2  ^2  2  3  -  2  3  '3  -  2  3' 


I  V3  =  a  -  'V3  +  b2  -  X2X3  +  J 2  -  X2X3  +  b3  l  x\  +  c3  I  xy 
l  X{X3  *  a  l  X}  +  b2  l  X2X3  +  a2l  M3  +  b  3  >  A'3  +  c’3  [  *3- 


Manual  solution  of  this  set  of  equations  is  lengthy  at  best.  Con¬ 
sequently,  one  of  the  many  computer  programs  available  should  probably 
be  used.  With  a  computer,  the  task  becomes  a  simple  one,  and  the 
chance  of  making  errors  in  arithmetic  is  minimum.  In  the  case  of  our 
example  the  derived  »quation  is 

A  »  5.006  +  0.2498  .Y  +  0.002301  X2  +  0.1499  ,Y  +  0.01000  A'2 

L  l.  Z  J  j 

Table  18  and  Fig.  46  indicate  that  we  have  indeed  found  the  cor¬ 
rect  empirical  equation.  However,  even  with  fits  as  good  as  this  one, 
unless  there  is  a  logical  base  for  the  particular  equation,  extrapola¬ 
tions  beyond  the  range  of  the  data  should  be  made  with  extreme  .'“ution. 
Such  is  particularly  true  when  the  relationship  i8  a  parabola.  (11'- 
view  the  section  of  this  Memorandum  on  the  general  properties  of 
parabolas . ) 
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Table  18 

RESULTS  OF  FITTING  A  THREE- VARI ABLE  PARABOLIC  RELATIONSHIP 
(T1  -  5.006  +  0.2498.Y2  +  0.002301^  +  0 . 1 499.V 3  +  O.OlOOA^) 


Observation 

A', 

1 

*2 

y 

3 

.Yj(calc.) 

VV'a'c.) 

Percent 

Deviation 

1 

7.31 

5 

5 

7.31 

2 

37.67 

5 

49 

37.67 

— 

— 

3 

67.37 

5 

71 

6  \  37 

— 

— 

A 

121.31 

5 

100 

121.31 

— 

— 

5 

20.93 

16 

27 

20.93 

— 

— 

6 

24.77 

27 

27 

24.77 

— 

— 

7 

33.57 

27 

38 

33.57 

— 

— 

8 

22.78 

38 

16 

22.78 

— 

9 

29.  16 

38  ! 

27 

29.16 

— 

10 

118.26 

38 

93 

! 18.26 

I 

— 

11 

39.62 

60 

27 

39.62 

I  -  1 

— 

12 

45.68 

71 

27 

45.68 

j 

— 

13 

149.34 

71 

100 

149.34 

— 

— 

14 

41.97 

82 

5 

-*1.97 

— 

— 

15 

59.48 

93 

27 

59.48 

--- 

— 

16 

OO 

V-/' 

00 

93 

93 

148.58 

-  : 

— 

17 

163.  14 

93 

100 

163.14 

— 

— 

18 

73.  14 

100 

38 

73.14 

— 

— 

19 

114.06 

100 

71 

114.06 

— 

— 

20 

153.44 

100 

93 

153.44 

— 

— 

We  1 1  a  v  t  iiAUjiiuhLu  c  r  omb  1 1. 1*  t  a  on  ol  two  siitii  i  a  r  two— variable 
relationships  to  form  a  single  three-variable  relationship.  In  fact, 
certain  dissimilar  two-variable  relationships  m3y  also  be  combined. 
For  exampie. 


rS 


may  be  combined  with 


Y 


1 


+  ‘  3 '3 


i 

n 


to  form 


- 1  ?4- 


h  '  J  +  Vi  + 


'  3'“3* 


If  it  were  observed  that  V ,  varied  linearly  with  and  nonlinearly 
with  X^,  an  equation  like  the  one  above  might  then  be  an  appropriate 
choice. 
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Appendix  A 

DERIVATION  OF  THE  NORMAL  EQUATIONS  FOR  A 
LEAST-SQUARES  FIT  OF  A  STRAIGHT  LINE,  A  PARABOLA, 
AND  A  THREE-VARIABLE  LINEAR  EQUATION 


A  Straight  Line 

In  curve  fitting  the  general  equation  of  a  straight  line  is 


where  i  and  i  are  the  parameters  to  be  determined  such  that  the  sum 
of  the  squares  of  the  deviations  from  the  resulting  line  is  a  mini¬ 
mum.  The  carets  are  placed  over  those  values  that  are  to  be  estimates. 
If  we  let  each  value  of  the  dependent  variable  be  represented  by 
with  the  subscript  assigned  according  to  the  data  point  we  are  using, 
we  can  write  the  general  formula  for  the  deviations  i  as 

j  _  . . 

''i  s  i 

On  substituting  the  expression  for  y  we  have 

d.  =  y.  -  (  a  +  Sx  .)  . 

L  L  l. 

The  squared  deviations  are 

d\  =  iyi  -  («  +  §x{)j2  , 

wliich  on  expansion  becomes 

2  2  -  -  -2  -2 
J.  =  u.  -  lay  .  -  2&x  .y  .  +  n  +  2adx.  +  B  x.  . 

i  J  i  *■ i  ic i  i  i 

We  need  such  an  expression  because  our  interest  is  in  minimizing 
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the  sum  of  the  squared  deviations.  The  expression  that  follows  repre¬ 
sents  symbolically  the  summation  of  the  above  expression  across  all 
values  of  i  from  1  to  n,  where  i  would  represent  the  first  data  point, 
2  the  second,  and  n  the  last: 


t  9  '  ' k  ,  n 

•  x  .y  .  +  ,w:  +  2ar;  J  x.  +  S4  ^  x. 
.‘  =  1  '  :  =1  :  i  =  l  : 


From  calculus  we  know  that,  for  this  expression  to  be  a  minimum,  the 

partial  derivatives  of  Q  taken  with  respect  to  a  and  8  must  be  equal 

to  0.  It  can  also  be  shown  that  this  is  a  sufficient  condition  for 

k 

the  above  expression  co  be  a  minimum.  Thus  the  procedure  is  to  ob¬ 
tain  these  two  partial  derivatives  and  to  equate  them  to  0.  The  par¬ 
tial  derivatives  are 


H) 


— — —  =  -2  V  y  .  +  2na  +  28  )  x. ; 
3u  -L,  i  •  i  i 

i=l  i  - 1 


■K) 


=  -2  }'  x.u  .  +  2u  V  x.  +  2§ 


After  equating  each  of  these  to  0  and  simplifying,  we  have 


l  y  .  *  na  +  &  l  x.\ 
i=l  ^  i= 1  l 


,1  =  «  l  x{  +  §  l  xi> 

i=]  i=l  i= 1 


which  are  the  required  normal  equations.  All  of  the  information  in¬ 
dicated  both  by  the  summation  signs  and  by  n  can  be  determined  directly 


The  condition  of  sufficiency  applies  to  any  function  that  is 
linear  with  respect  to  all  of  its  parameters,  such  as  the  parabola. 
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from  the  data;  this  will  result  in  two  equations  in  two  unknowns  ( 
and  ;• )  which  can  be  solved  for  simultaneously. 

A  Parahola 

The  general  equation  of  a  parabola  is 

a 

y  -  a  +  2a'  +  t 

w’here  \ ,  ?,  and  j  are  the  parameters  to  be  determined. 

An  individual  squared  deviation  may  be  represented  as  follows: 


i?  =  f  u  .  ~  (>  t  i'x.  +  rx?)]", 

i  •  t  i  i 


which  when  expanded  is 


i  ) 


.  -  o 


;f?  -  u.  -  2d  k.  -  2i?x  .u .  -  +  a*-  +  2-*?x.  +  2xyx . 

t  •  ?,  -  i.  v  i  vi  i  t 


-.2  2  ,  3  .  -2  4 

+  b  x  •  +  2dy.r .  my  x .  • 


The  sum  of  the  squared  deviations  taken  from  =  1  to  n  is 


n  n  ,  n  n 

y  d2  -  y  y:  -  2a  y  -  28  ■ 


n 


’-I  '  7=1  '■  7>1  *' 


/  x  .:.v  —  2d  )  a*.k  .  +  +  2a§  /  x 

•  ,  •  ,  7"  l  .-Li  i 

7=1  7=1  7.  1 


77  ,  ,  n  ,  77  ~  ,77 

+  2«y  y  xt  +  p  y  x.  +  2 by  y  x,  +  y  y 
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To  minimize ,  we  take  the  partial  derivatives  with  respect  to  a,  6  and 
Y  and  equate  them  to  0  as  follows: 


,2 


'■)a 


t> 1 


•  hi +  y  x- +  2>  y  x7-» 


7  =  1 


7  =  1 
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n  n  n  0  n  ^ 

2  y  x  .u .  +  2a  y  x.  +  2g  y  +  ,?y  y  x". 


These  are  che  normal  equations  for  fitting  a  parabola  using  the  least- 
squares  criterion.  The  sums  and  sums  of  products  are  calculated  di¬ 
rectly  from  the  data  and  substituted  into  the  normal  equations  leaving 
three  equations  and  three  unknowns.  These  three  equations  are  solved 
simultaneously  for  a,  6  and  y.  As  for  the  straight  line,  the  solutions 
are  unique  and  exact  for  all  j  and  y . 

A  Three-variable  Linear  Equation 

The  general  form  of  the  linear  three-variable  equation  is 

y  "  “  +  Vl  +  ^2X2 ‘ 

The  derivation  procedure  is  identical  to  that  used  in  deriving  the 
normal  equations  for  the  straight  line  and  for  the  parabola.  The 
squared  deviations  are 

cf2  “  iy  ~  (at  +  +  ^2*r2 ^  ^  ‘ 
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The  above  expression  is  expanded  and  summed  across  all  the  data  points, 
and  the  partial  derivatives  of  the  summation  equation  with  respect  to 

a,  B,  »  and  6~  are  taken  and  equated  to  0.  The  resulting  normal  equa- 

x  /X 

tions  are 


l  y  =  na  +  Sj  l  X1  +  &2  I  x2’ 


1 

- 

r 

2 

;  r 

“  a  - 

>  x 1  + 

61 

L 

*1  ' 

°2  L  x\x2' 

X2^ 

-  01  L 

[  xn  + 

S1 

V 

L 

XlX2 

-  r  2 

+  B2  [  x 2 . 

In  the  above  equation  the  subscripts  are  used  to  distinguish  between 
the  two  independent  variables  and  their  coefficients  rather  than  to 
indicate  the  range  of  summation  as  before.  Although  it  is  not  spe¬ 
cifically  indicated  here,  it  should  be  understood  that  the  sums  are  to 
be  taken  across  all  data  points. 


Appendix  B 

DERIVATION  OF  THE  FORMULA  FOR  CALCULATING  THE  CONSTANi 


The  value  of  x,  as  is  shown  in  Fig.  3-1,  must  be  such  that  when 
the  values  of  the  w  coordinates  of  points  on  L ^  are  reduced  by  that 
amount,  Lhe  new  points  fail  on  the  lir.c  7  .  Further,  L0  must  be 

linear  in  terms  of  logarithms. 

For  L^  to  be  linear  in  terms  of  logarithms,  the  triangles  ad C 
and  BDE  must  be  similar.  In  other  words,  L ^  must  have  a  constant 
slope.  It  is  this  fact  that  provides  the  basis  for  calculating  i. 

The  slope  of  the  triangle  ABC  is  equal  t  :> 


and  the  slope  of  the  triangle  BDE  is  equal  to 


Also  the  two  slopes  must  be  equal  to  each  other,  e.g., 

BC  DE’ 

When  the  coordinates  of  the  appropriate  points  are  used  to  calculate 
the  lengths  of  the  above  line  segments  and  the  results  are  substituted 
in  Eq .  1 ,  we  have 

log(^l  -  x)  -  log (w3  -  .)  log (y^  -  x)  -  log (i/ 2  -  *) 


log  Jj  -  log  Xj 


log  x.j  -  log  x 2 


Since  we  are  free  to  select  the  three  points  (x^,  (x^ ,  y and 

(x.j,  ’(j)  in  any  way  we  wish,  we  do  so  in  such  a  way  that  the  denominators 
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of  the  two  fractions  in  Eq.  2  are  equal,  such  as 

log  Xj  -  log  *3  =  log  x3  -  log  x2.  (3) 

General  practice  is  to  choose  x^  and  x^  at  the  extremities  of  L,  and 
to  let  Eq.  3  determine  the  value  of  x3 ,  such  as 

2  log  x3  =  log  Xj.  +  log  x2, 
or 

log  X.  +  log  x? 

log  x3  =  — - 2 - - - •  (4) 


As  can  be  seen,  log  x3  is  the  average  of,  or  half  way  between,  log  x^ 
and  log  x0. 

Equation  4  in  arithmetic  form  is 


x 3  -  >XjX2; 


x3  is  seen  to  be  the  geometric  mean  of  x^  and  x^ . 

If  x3  is  chosen  in  this  wav,  Eq.  2  then  reduces  to 

log (;<  j  -  i)  -  log(^3  -  )  *  log(:< 3  -  \)  -  log(^T  -  i) 


In  arithmetic  form  we  have 


ll 

•U3 
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-  lw  2 


+ 


2 


2 
'  3 


2 


+ 


2 

i  ; 


Equation  5  is  the  desired  result. 

If  we  had  been  concerned  with  the  semi-log  case,  Fq .  2  would 
have  been 


log(tv1  -  y)  -  log (u ^  -  y)  log(^3  -  a)  -  log (y2  -  a) 


and  Eq.  4  would  be 

*1  *  x2 
X3  2 

We  would  therefore  make  the  average  of  and  x,  instead  of  the 


geometric  mean.  Equation  5  applies  as  before. 
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